johnning2333 / M2Doc

31 stars 0 forks source link

How to get ocr_anno files? #5

Closed RicoJYang closed 2 months ago

RicoJYang commented 2 months ago

In the ocr_anno_convert.py file, is ocr_anno_path the result of the OCR program output? , save_anno_path is an empty newly created folder. Can the author provide the OCR software used? Thank you very much.

johnning2333 commented 2 months ago

In the ocr_anno_convert.py file, is ocr_anno_path the result of the OCR program output? , save_anno_path is an empty newly created folder. Can the author provide the OCR software used? Thank you very much.

That's just a converting script to convert DocLayNet OCR Annotations(Provided by the Dataset) into our training format. Please make sure you download the DocLayNet dataset first.