Some questions - Githubissues

light42 commented 1 year ago

Before I ask questions, let me report what I found when I test the model you've trained.

It can't handle "extremes", if the cell size is too large or too small or if a cell has multiple line of texts it would be hard to detect.
It can't handle merged column/rows.
It has some capability to recognize empty cells(need to test further).
It could recognize tables with large number of cells(need to test further).
Overall, if the table behaved nicely (no merged column/rows, adequate cell-size) it is quite accurate.

Questions:

Is structure recognition result heavily depends on ocr results?
How many PubTable samples you used?
In your opinion, could this method be better than existing state-of-the-art tools (PaddleOCR)?

Overall I actually impressed with training result of your model, even if it's only small part of Pub1M it's still impressive that it's not overfitted. I've trained PaddleOCR for table recognition and somehow it always overfitted.

whn09 commented 1 year ago

Hi,

Thank you for you feedback and questions. And I will explain my method here:

I used all PubTables-1M train data to train the yolov5s model, and use test data to evaluate the model
Convert VOC-PASCAL format to COCO format
Train detection/structure model using yolov5s (14.4M vs DETR 110M) for 10 epochs (size=640)
- Detection model: mAP@0.5=0.995 (vs DETR 0.995)
- Structure model: mAP@0.5=0.962 (vs DETR 0.971)
Merge OCR result with table detection and structure result using postprocess.objects_to_cells (implemented in table-transformer)

For your questions, I try to give some responses:

Is structure recognition result heavily depends on ocr results? Yes, since PubTables-1M only provides rows and cols labeling, we need some post-processes to get the cell result, and in the original code (table-transformer), the post-processes need ocr result.
How many PubTable samples you used? All train data to train the model, and val data to validate the model, and test data to evaluate the model. You can see the details here: https://github.com/whn09/table_structure_recognition/blob/main/yolov5/data/custom-detection.yaml
In your opinion, could this method be better than existing state-of-the-art tools (PaddleOCR)? I think the model is better than PaddleOCR, and even table-transformer. The method is a commonly used method, and we can get good result using Yolov5s, and if you want to get better result, you can use yolov5m or larger models.

light42 commented 1 year ago

I'm not tested it yet, but I think if you train yolo for text detection it will give great result, after that even EasyOCR/Tesseract could be used for text recognition. My colleague use yolov7 for detecting texts in official documents, and it worked great. You could use it to finally completed the pipeline.

You could use SynthTabNet dataset, for training since it contains bbox for each texts in the cells. And maybe add some little noises using shabby-pages so that it could handle table images in imperfect conditions.

whn09 / table_structure_recognition

Some questions #1