Open light42 opened 1 year ago
Hi,
Thank you for you feedback and questions. And I will explain my method here:
For your questions, I try to give some responses:
Is structure recognition result heavily depends on ocr results? Yes, since PubTables-1M only provides rows and cols labeling, we need some post-processes to get the cell result, and in the original code (table-transformer), the post-processes need ocr result.
How many PubTable samples you used? All train data to train the model, and val data to validate the model, and test data to evaluate the model. You can see the details here: https://github.com/whn09/table_structure_recognition/blob/main/yolov5/data/custom-detection.yaml
In your opinion, could this method be better than existing state-of-the-art tools (PaddleOCR)? I think the model is better than PaddleOCR, and even table-transformer. The method is a commonly used method, and we can get good result using Yolov5s, and if you want to get better result, you can use yolov5m or larger models.
I'm not tested it yet, but I think if you train yolo for text detection it will give great result, after that even EasyOCR/Tesseract could be used for text recognition. My colleague use yolov7 for detecting texts in official documents, and it worked great. You could use it to finally completed the pipeline.
You could use SynthTabNet dataset, for training since it contains bbox for each texts in the cells. And maybe add some little noises using shabby-pages so that it could handle table images in imperfect conditions.
Before I ask questions, let me report what I found when I test the model you've trained.
Questions:
Overall I actually impressed with training result of your model, even if it's only small part of Pub1M it's still impressive that it's not overfitted. I've trained PaddleOCR for table recognition and somehow it always overfitted.