Open MohamedLahmeri01 opened 3 weeks ago
Hi,
There's an extensive thread at https://github.com/huggingface/transformers/issues/19329 as well as https://github.com/microsoft/unilm/issues/627
@NielsRogge could you check my essey , based on your notebook , https://www.kaggle.com/code/cherryblade29/forarabic-and-english
hi, i got nice CER (0.1) while fine tuning small model on Indonesian language, i found that it's got nicer if im not change my processor.tokenizer to any Indonesian tokenizer. Anyway, i have my own Indonesian cropped image text dataset from form documents, it is consist name person, birth-place date, etc. Do you think it's a good idea if i randomly combine those cropped image text into longer (horizontally) image text for my training dataset?
Model description
hello , just passed through issues and other , but none of them talked on how to fine-tune TrOCR on specifique langage , like how to pick encoder and decoder , model .. etc , can you @NielsRogge , write a simple instructions/guide on this topic ?
Open source status
Provide useful links for the implementation
No response