Open juvebogdan opened 3 years ago
There is two problems there; first, you must remove the categories (last column) from the tsv input or pick will get confused; second, SROIE dataset has many transcript errors, the training and prediction end up very messed up because of them.
I had the same problem of not removing the last column. I became desperate until I was able to realize it. I'm glad I wasn't the only one ... ahahahha
Oh. Thank you. I tried removing it. But if I remove it just from tsv files then I am getting some errors just at the start of training. Do I remove this in actual tsv files during preprocess or somewhere else?
@juvebogdan Be careful because you have to remove them only from those you use for inference. That is, only those that you pass to the test.py file.
I understand. Thank you very much
I think i need to change keys.txt file as well. Is this required?
No, it is not necessary. Take a look at the arguments that test.py needs
Therefore you only need the images and the boxes and transcripts (Without the tag column)
@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only
+---------+----------+----------+----------+----------+
| name | mEP | mER | mEF | mEA |
+=========+==========+==========+==========+==========+
| total | 0.504762 | 0.550173 | 0.52649 | 0.550173 |
+---------+----------+----------+----------+----------+
| address | 0.60628 | 0.394035 | 0.47764 | 0.394035 |
+---------+----------+----------+----------+----------+
| company | 0.564706 | 0.571429 | 0.568047 | 0.571429 |
+---------+----------+----------+----------+----------+
| date | 0.877551 | 0.914894 | 0.895833 | 0.914894 |
+---------+----------+----------+----------+----------+
| overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 |
+---------+----------+----------+----------+----------+
@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only
+---------+----------+----------+----------+----------+ | name | mEP | mER | mEF | mEA | +=========+==========+==========+==========+==========+ | total | 0.504762 | 0.550173 | 0.52649 | 0.550173 | +---------+----------+----------+----------+----------+ | address | 0.60628 | 0.394035 | 0.47764 | 0.394035 | +---------+----------+----------+----------+----------+ | company | 0.564706 | 0.571429 | 0.568047 | 0.571429 | +---------+----------+----------+----------+----------+ | date | 0.877551 | 0.914894 | 0.895833 | 0.914894 | +---------+----------+----------+----------+----------+ | overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 | +---------+----------+----------+----------+----------+
I suppose you should try early stop method
I think you ended up with an overfitting problem, how many images did you use for train/test data ?
Hello,
I trained your model on sroie. During training I got following:
| name | mEP | mER | mEF | mEA | +=========+==========+==========+==========+==========+ | company | 0.887363 | 0.904762 | 0.895978 | 0.904762 | +---------+----------+----------+----------+----------+ | address | 0.947084 | 0.950163 | 0.948621 | 0.950163 | +---------+----------+----------+----------+----------+ | total | 0.804009 | 0.897266 | 0.848081 | 0.897266 | +---------+----------+----------+----------+----------+ | date | 0.981878 | 0.996656 | 0.989212 | 0.996656 | +---------+----------+----------+----------+----------+ | overall | 0.900719 | 0.937126 | 0.918562 | 0.937126 |
But when I run it on test set I get pretty bad results. for example total is missing a lot. Looks like this for example;
company KAISON FURNISHING SDN BHD,company address L4-17 (B)\, LEVEL 4,address address UP2-01\, MELAWATI MALL,address address 355\, JALAN BANDAR MELAWATI,address address PUSAT BANDAR MELAWATI,address address 53100 KUALA LUMPUR.,address date 29-01-18 address 2\,305.80 SR,other address 3 total \,33 address 6.00 SR,othe address 2\,197.00 SR,other address 7\,838.80,other address -7\,840.00,other address 7\,395.09,other address 7\,838.80,other
This one is even on training set example.