opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.04k stars 337 forks source link

Integration with GOT-OCR 2.0 #129

Open whisper-bye opened 1 month ago

whisper-bye commented 1 month ago

https://arxiv.org/abs/2409.01704 https://github.com/Ucas-HaoranWei/GOT-OCR2.0

wangbinDL commented 3 weeks ago

Thank you for the recommendation. GOT-OCR 2.0 is indeed one of the latest and most advanced models for document extraction, showing superior performance compared to previous models. We will conduct a thorough evaluation of its effectiveness on diverse document types in the near future. Based on the results, including speed and accuracy, we will consider integrating it into our Kit.