Open aimlnerd opened 1 year ago
I am using your nice tutorial for applying lilt on excel file converted to images and the text is in dutch.
https://github.com/jpWang/LiLT/issues/28 In the above link, author of LILT has mentioned that the model is pretrained on "segment-level box".
During inference in your code ocr is applied
https://github.com/philschmid/document-ai-transformers/blob/main/training/lilt_funsd.ipynb
# change apply_ocr to True to use the ocr text for inference processor.feature_extractor.apply_ocr = True
Question
processor.feature_extractor.apply_ocr = True
I am using your nice tutorial for applying lilt on excel file converted to images and the text is in dutch.
https://github.com/jpWang/LiLT/issues/28 In the above link, author of LILT has mentioned that the model is pretrained on "segment-level box".
During inference in your code ocr is applied
https://github.com/philschmid/document-ai-transformers/blob/main/training/lilt_funsd.ipynb
Question
processor.feature_extractor.apply_ocr = True
? word token level or "segment-level box"?