infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
23.52k stars 2.31k forks source link

[Question]: table structure recognition (TSR) algorithm #2642

Open Walker-DJ1 opened 1 month ago

Walker-DJ1 commented 1 month ago

Describe your problem

Can you tell me the table structure recognition (TSR) algorithm and the open source address? It's a great job.

yingfeng commented 1 month ago

The open source TSR is based on object detection and is trained using YOLO, we haven't made the training script public yet, you could train TSR by yourselves using the public dataset such as PubTable, CDLA. We've transformer based TSR which is only availble in enterprise version, it could provide better recognition accuracy.