Output files are empty when testing

sophiajwchoi commented 4 years ago

Hello, thank you so much for providing code for great research paper. I am just wondering after I run python train.py --config config.json. All the values of mEP, mER, mEF, mEA are all 0 for every train Epoch. Is this normal? 2020-07-26 20:45:35.357166: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 [2020-07-26 20:45:40,891 - train - INFO] - Model trainable parameters: 68575842 [2020-07-26 20:45:40,892 - train - INFO] - Train datasets: 8 samples Validation datasets: 8 samples Max_epochs: 100 Log_per_step: 10 Validation_per_step: 50 [2020-07-26 20:45:40,892 - train - INFO] - Training start... [2020-07-26 20:46:07,680 - trainer - INFO] - Train Epoch:[1/100] Step:[2/2] Total Loss: 1121.992798 GL_Loss: 8.390446 CRF_Loss: 1113.602295 [2020-07-26 20:46:09,988 - trainer - INFO] - [Epoch Validation] Epoch:[1/100] Total Loss: 1430.095032 GL_Loss: 0.091890 CRF_Loss: 1420.906006 +---------+-------+-------+-------+-------+ | name | mEP | mER | mEF | mEA | +=========+=======+=======+=======+=======+ | date | 0 | 0 | 0 | 0 | +---------+-------+-------+-------+-------+ | name | 0 | 0 | 0 | 0 | +---------+-------+-------+-------+-------+ | overall | 0 | 0 | 0 | 0 | +---------+-------+-------+-------+-------+ Also, I ran

python test.py --checkpoint /content/PICK-pytorch/saved/models/PICK_Default/test2_0726_194721/model_best.pth --boxes_transcripts /content/PICK-pytorch/data/test_data_example/boxes_and_transcripts \
               --images_path /content/PICK-pytorch/data/test_data_example/images/ --output_folder /content/PICK-pytorch/output/test2 \
               --gpu 0 --batch_size 2

Output files are generated, but the files are actually empty. Can you please give me the right direction to test the model? Thank you!

lmpan commented 4 years ago

I have the same issue. I preprocessed SROIE labels following data format example, and trained the model.

[2020-07-27 12:11:02,676 - trainer - INFO] - Train Epoch:[50/200] Step:[10/39] Total Loss: 0.332347 GL_Loss: 0.127635 CRF_Loss: 0.204712 [2020-07-27 12:14:30,065 - trainer - INFO] - Train Epoch:[50/200] Step:[20/39] Total Loss: 0.234698 GL_Loss: 0.156055 CRF_Loss: 0.078644 [2020-07-27 12:17:52,441 - trainer - INFO] - Train Epoch:[50/200] Step:[30/39] Total Loss: 0.348214 GL_Loss: 0.152596 CRF_Loss: 0.195618 [2020-07-27 12:20:57,600 - trainer - INFO] - [Epoch Validation] Epoch:[50/200] Total Loss: 0.239716 GL_Loss: 0.001431 CRF_Loss: 0.096644

Output files are generated after testing, but files are empty.

When testing, decoded tags are actually all 'O', no matter what data is fed into the model. Then decoded tags are passed to a function bio_tags_to_span(), after that an empty list called spans is returned. It seems that bio_tags_to_span() ignors all 'O' tags, and only extracts key info. How much must the loss be reduced so that the model could generate meaningful output other than 'O'?

wenwenyu commented 4 years ago

@sophiajwchoi Thanks for your interest in our paper. It is a normal phenomenon that all the values of mEP, mER, mEF, mEA are all zero at an early stage because the training experiment has not convergence. From your training log, you should only use 8 examples I provided. My experiment 8 examples used too shows that valid metrics are zero before 45 epoch and CRF loss is a large value. After 45 epoch, the metric has a positive value gradually. You can try to train more epoch and then test the model. Hope it can help you.

wenwenyu commented 4 years ago

@lmpan From your provided training log, it seems the model has already convergence due to loss is an ideal value. Maybe you should check the label of data whether generate correctly? BTW, are the validation metric also zero in all training phase.

wenwenyu commented 4 years ago

@sophiajwchoi BTW, I noticed that validation metrics has name entity in the training log you provided, but the examples you used have no entity name actually. One possible reason is that you didn't modify Entities_list in utils/entities_list.py file. In that case, you should modify Entities_list to ["company", "address", "date", "total"].

sophiajwchoi commented 4 years ago

Thank you for replying! The issue has been resolved. Just to confirm, this model can be used to train other form documents.

wenwenyu commented 4 years ago

@sophiajwchoi Theoretically, this model is also compatible with other types of documents. All you need to do is to gather as many data and label as possible.

wenwenyu / PICK-pytorch

Output files are empty when testing #6