sroie results - Githubissues

juvebogdan commented 3 years ago

Hello,

I trained your model on sroie. During training I got following:

| name | mEP | mER | mEF | mEA | +=========+==========+==========+==========+==========+ | company | 0.887363 | 0.904762 | 0.895978 | 0.904762 | +---------+----------+----------+----------+----------+ | address | 0.947084 | 0.950163 | 0.948621 | 0.950163 | +---------+----------+----------+----------+----------+ | total | 0.804009 | 0.897266 | 0.848081 | 0.897266 | +---------+----------+----------+----------+----------+ | date | 0.981878 | 0.996656 | 0.989212 | 0.996656 | +---------+----------+----------+----------+----------+ | overall | 0.900719 | 0.937126 | 0.918562 | 0.937126 |

But when I run it on test set I get pretty bad results. for example total is missing a lot. Looks like this for example;

company KAISON FURNISHING SDN BHD,company address L4-17 (B)\, LEVEL 4,address address UP2-01\, MELAWATI MALL,address address 355\, JALAN BANDAR MELAWATI,address address PUSAT BANDAR MELAWATI,address address 53100 KUALA LUMPUR.,address date 29-01-18 address 2\,305.80 SR,other address 3 total \,33 address 6.00 SR,othe address 2\,197.00 SR,other address 7\,838.80,other address -7\,840.00,other address 7\,395.09,other address 7\,838.80,other

This one is even on training set example.

compadrejavo commented 3 years ago

There is two problems there; first, you must remove the categories (last column) from the tsv input or pick will get confused; second, SROIE dataset has many transcript errors, the training and prediction end up very messed up because of them.

jorgerodriguezsj commented 3 years ago

I had the same problem of not removing the last column. I became desperate until I was able to realize it. I'm glad I wasn't the only one ... ahahahha

juvebogdan commented 3 years ago

Oh. Thank you. I tried removing it. But if I remove it just from tsv files then I am getting some errors just at the start of training. Do I remove this in actual tsv files during preprocess or somewhere else?

jorgerodriguezsj commented 3 years ago

@juvebogdan Be careful because you have to remove them only from those you use for inference. That is, only those that you pass to the test.py file.

juvebogdan commented 3 years ago

I understand. Thank you very much

juvebogdan commented 3 years ago

I think i need to change keys.txt file as well. Is this required?

jorgerodriguezsj commented 3 years ago

No, it is not necessary. Take a look at the arguments that test.py needs

Checkpoint
Boxes and transcripts (Without the tag column) of the images wich you want to get the info
Path of the folder in which are the images from which you want to get the information.
Path of the folder where you want to save the output results of each image
GPU id to use
Batch size

Therefore you only need the images and the boxes and transcripts (Without the tag column)

minhhoangbui commented 3 years ago

@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only

+---------+----------+----------+----------+----------+
| name    |      mEP |      mER |      mEF |      mEA |
+=========+==========+==========+==========+==========+
| total   | 0.504762 | 0.550173 | 0.52649  | 0.550173 |
+---------+----------+----------+----------+----------+
| address | 0.60628  | 0.394035 | 0.47764  | 0.394035 |
+---------+----------+----------+----------+----------+
| company | 0.564706 | 0.571429 | 0.568047 | 0.571429 |
+---------+----------+----------+----------+----------+
| date    | 0.877551 | 0.914894 | 0.895833 | 0.914894 |
+---------+----------+----------+----------+----------+
| overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 |
+---------+----------+----------+----------+----------+

HoKinChung commented 2 years ago

@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only

+---------+----------+----------+----------+----------+
| name    |      mEP |      mER |      mEF |      mEA |
+=========+==========+==========+==========+==========+
| total   | 0.504762 | 0.550173 | 0.52649  | 0.550173 |
+---------+----------+----------+----------+----------+
| address | 0.60628  | 0.394035 | 0.47764  | 0.394035 |
+---------+----------+----------+----------+----------+
| company | 0.564706 | 0.571429 | 0.568047 | 0.571429 |
+---------+----------+----------+----------+----------+
| date    | 0.877551 | 0.914894 | 0.895833 | 0.914894 |
+---------+----------+----------+----------+----------+
| overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 |
+---------+----------+----------+----------+----------+

I suppose you should try early stop method

ziodos commented 2 years ago

I think you ended up with an overfitting problem, how many images did you use for train/test data ?

wenwenyu / PICK-pytorch

sroie results #76