Closed faizan1041 closed 8 months ago
I fixed the issue myself, the issue is the box files are reversed for the RTL languages, so you need to reverse the box files again to match LTR. Here is the script I made for this: https://github.com/faizan1041/tesstrain_helpers/blob/main/reverse_box_files.py
Anyone facing the same issue can generate the box files and run the above script, which will replace the box files content in the revered order.
Hi @Shreeshrii and others,
I'm trying to train on Arabic dataset and the results get worse after training:
I tried different start models as well, like ara.traineddata from best and fast repos but no luck.![Screenshot from 2023-10-11 14-59-38](https://github.com/tesseract-ocr/tesstrain/assets/8593538/b7f02055-de6d-4320-8f42-f9eb8c393d80)
in the image's gt.txt I have:
محمد عبد الله ريشم خان This is what in the box file is:
Result from the original model: محمد عبد اللە ۔ ر یشم خان Result from the model trained for 1000 steps: يا لي ن دع ر
Remember when I train an English model on the English dataset, I see improvements on 1000 steps or even lower. Can you guide what is the issue?