Closed yellowjs0304 closed 6 months ago
Hi @yellowjs0304, thanks for your interest!
Yes, "content.pt" is used for OCR, i.e., table cell content recognition. I believe any OCR model that works for multi-line text recognition can be employed in UniTable's cell content branch. The beauty of UniTable lies in unifying all three tasks into one self-supervised training and inference pipeline, rather than combining existing methods from non-table domains.
Got it, Thank you.
@ShengYun-Peng Hi, Thank you for sharing your amazing work.
while running your pipeline, I have some questions. Does your model work with other languages? like Japanese, Chinese, Korean etc..
I think the "content.pt" model is the role of OCR. Is this right?
If I use the other OCR or PDF extractor like PDFParser, Is it can be applied to your pipeline?