Closed keeofkoo closed 4 years ago
This isn't an error or bug in the code (hence cause for an issue), but the behavior you're experiencing with a certain use case. You should open a thread on stackoverflow about training such models in those cases.
It can perhaps be hard for the LSTM to get "off the ground". Perhaps you should try pre-training a simple CNN classifier and importing those weights to the full model.
I was planning to adapt the architecture to recognize a large set of characters, like Japanese and Chinese, but found out the model does not learn anything about a certain set of characters, of which some are even among the most used ones. I trained the model with a dataset of
120k+
cropped words (which involves roughly3k
characters) for500k
steps and got a loss around5
. I have checked that the cannot-be-learned characters are indeed in both training and validation datasets. I printed out the intermediate logits for debugging, only to find that they are the save value (like,[3.2362458e-4, 3.2362458e-4, ... 3.2362458e-4]
), meaning the model has no clue which class it should fall into. There are roughly100
characters that are in this case, while the rest (the majority) seem to be fine. I have also referred to #42, and tried following the training schedule, but all I got is a new set of cannot-be-learned characters.