Closed hiepph closed 6 years ago
The model itself doesn't use any particular alphabet. You simply need a way to map the labels to a set of consecutive positive integers (that's how the ctc layer works in tensorflow).
As you noted, I do this in src/mjsynth.py
by constructing a single string and then using the characters' indices. For other unicode characters, you'd want to make sure the use of string.index
as in the data generator works for them.
I want to recognize more than just English alphabet and numbers (e.g. special Unicode characters). Is this possible and how can I do this?
Suppose I have my own dataset, do I have to write my own data loader and provide
like in your
src/mjsynth.py