infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
18.27k stars 1.85k forks source link

[Question]: table structure recognition (TSR) algorithm #2642

Open Walker-DJ1 opened 1 day ago

Walker-DJ1 commented 1 day ago

Describe your problem

Can you tell me the table structure recognition (TSR) algorithm and the open source address? It's a great job.

yingfeng commented 1 day ago

The open source TSR is based on object detection and is trained using YOLO, we haven't made the training script public yet, you could train TSR by yourselves using the public dataset such as PubTable, CDLA. We've transformer based TSR which is only availble in enterprise version, it could provide better recognition accuracy.