LPGMA:Is there fine tuning code?

hikopensource / DAVAR-Lab-OCR

OCR toolbox from Davar-Lab

Apache License 2.0

739 stars 157 forks source link

LPGMA:Is there fine tuning code? #137

Open gyokuro33 opened 1 year ago

gyokuro33 commented 1 year ago

I read the LPGMA paper. Thanks for the good research. However, I immediately tried to use the code in the paper to actually use it, but I cannot find the fine tuning code. In the paper, it was written as follows: Is the fine tuning code not published? If not, could you please publish it or tell me how you fine tunig it? Thank you.

"We also report the result of the model that is then fine-tuned on the training set of ICDAR2013, labeled by †. The results of DeepDeSRT on SciTSR come from [2]. The results are demonstrated in Table 1, from which we can see that the proposed LGP-TabNet vastly surpasses previous advances on these three benchmarks by 4.4%, 3.5%, 2.5%, respectively. "

qiaoliang6 commented 1 year ago

We have only published the model trained on PubTabNet, you may use the same code the train the model on ICDAR2013 or other datasets.

gyokuro33 commented 1 year ago

Thank you. So I have one more question, could you please tell me how you set up the config for that fine tuning? Especially about lr, momentum and weight_decay for optimizer. The epoch was mentioned as 25 in the paper, but there was no mention of anything else. I am currently doing fine tuning with my own data, but I am having trouble just keeping the settings in lpgma_base.py and adding the publicly available maskrcnn-lgpma-pub-e12-pub.pth to load_from. The accuracy of the detection has dropped drastically from what it was before fine tuning.

qiaoliang6 commented 1 year ago

Some of the parameters you mentioned are actually un likely to have a big impact on the performance of the model. It is usually caused by a possible problem with the data label or target generation. In addition, if you evaluate the detection indicators, the lgpma training target is the enlarged aligned boxes, which cannot be evalated with the original text bounding boxes. It is better to directly see the visualization result to judge whether the model is convergence, or in accordance with the evaluation of the table structure.

gyokuro33 commented 1 year ago

In other words, parameters do not matter. The reason why fine tuning does not improve accuracy is due to the quantity or quality of the additional data, or the small number of epochs. Are you saying that the model training is not converging because of these factors, and therefore the accuracy is not considered good? And that whether the model is converging or not cannot be determined by the LOSS that is displayed during training? Also, at this point in time, the training data for ICDAR2013 in the paper was 98, so the additional data was trained with 100 data and 25 epochs. LOSS is down to 2.98 at 25epoch.

gyokuro33 commented 1 year ago

Also, for fine tuning, I set "load_from=" in lgpma_base.py to the publicly available maskrcnn-lgpma-pub-e12-pub.pth, is that correct? Not setting it to "resume_from".

qiaoliang6 commented 1 year ago

The quatity of IC13 images is very small, so the model should be converged (or even over-fitted) easily. You may train the dataset for a larger epoch to see whether it can be overfitted on the training set.

qiaoliang6 commented 1 year ago

Also, for fine tuning, I set "load_from=" in lgpma_base.py to the publicly available maskrcnn-lgpma-pub-e12-pub.pth, is that correct? Not setting it to "resume_from".

yes

gyokuro33 commented 1 year ago

Thanks for answering my question. I just don't see any convergence when I set the EPOCH to 100. ...... Is the quality of the data bad ...... I'll try a few more things.