the fine-tuning of language in the content section

poloclub / unitable

UniTable: Towards a Unified Table Foundation Model

https://arxiv.org/abs/2403.04822

MIT License

381 stars 28 forks source link

the fine-tuning of language in the content section #21

Open num3num opened 4 months ago

num3num commented 4 months ago

Unitable is a powerful recognition tool, but I want to train table content recognition that supports other languages. Have any good suggestions or opinions?

ShengYun-Peng commented 4 months ago

I would suggest finetuning the OCR branch with the targeted language and UniTable should work out-of-the-box.

num3num commented 4 months ago

In the recognition of the bbox section, there may be a large amount of text or gaps in a single bbox, which can lead to content loss or misalignment. Do you have any good suggestions for this situation? What model or debugging method is called for pre training or fine-tuning of unitable_1arge_bbox.pt?