Which is the correct bbox ocr level for LiLT? block level or word level?

I am using your nice tutorial for applying lilt on excel file converted to images and the text is in dutch.

https://github.com/jpWang/LiLT/issues/28 In the above link, author of LILT has mentioned that the model is pretrained on "segment-level box".

During inference in your code ocr is applied

# change apply_ocr to True to use the ocr text for inference
processor.feature_extractor.apply_ocr = True

Question

which kind of ocr is applied in processor.feature_extractor.apply_ocr = True ? word token level or "segment-level box"?
How to ensure the same "segment-level box" ocr is applied for finetuning and inference?
Any pointers on implement the correct ocr level using pytesseract?

philschmid / document-ai-transformers