Open IrtazaIjaz opened 8 months ago
How is this related to Python and pytesseract? By the way: GitHub allows formatting code sections as code to improve readability (just use the <>
button after marking the corresponding lines).
Also, it seems you try to run training on some platform (kaggle?) - run it on your local computer Linux/WSL or Mac. Next do not report problems with your data - first, make sure that example data training works (e.g. you install and set training env correctly )
Hi @zdenop,
I'm running it on Jupyter Notebook. I started with a single page that contained 10 lines only.
Hi @stefan6419846,
I'm working on Jupyter notebook for python and writing the code in it. Moreover, I have also made the code more readable as you suggested.
Thanks
Hi All,
I'm having trouble executing the fine-tunning on this repository. Below is my code which I run on my Jupyter notebook:
Step-6: I have replaced /content/tesstrain/data/irt/list.train folder with my file which contains below text:
/content/tesstrain/data/irt-ground-truth/page_10_line_1.png نقش فریادی ہے کس کی شوخیٔ تحریر کا /content/tesstrain/data/irt-ground-truth/page_10_line_2.png کاغذی ہے پیرہن ہر پیکر تصویر کا /content/tesstrain/data/irt-ground-truth/page_10_line_3.png کاو کاو سخت جانی ہائے تنہائی نہ پوچھ /content/tesstrain/data/irt-ground-truth/page_10_line_4.png صبح کرنا شام کا لانا ہے جوئے شیر کا /content/tesstrain/data/irt-ground-truth/page_10_line_5.png جذبۂ بے اختیار شوق دیکھا چاہیے /content/tesstrain/data/irt-ground-truth/page_10_line_6.png سینۂ شمشیر سے باہر ہے دم شمشیر کا /content/tesstrain/data/irt-ground-truth/page_10_line_7.png آگہی دام شنیدن جس قدر چاہے بچھائے /content/tesstrain/data/irt-ground-truth/page_10_line_8.png مدعا عنقا ہے اپنے عالم تقریر کا /content/tesstrain/data/irt-ground-truth/page_10_line_9.png نبسکہ ہوں غالبؔ اسیری میں بھی آتش زیر پا /content/tesstrain/data/irt-ground-truth/page_10_line_10.png موئے آتش دیدہ ہے حلقہ مری زنجیر کا
Step8 OutCome: You are using make version: 4.3 lstmtraining \ --debug_interval 0 \ --traineddata data/irt/irt.traineddata \ --old_traineddata /content/tesstrain/usr/share/tessdata/urd.traineddata \ --continue_from data/urd/irt.lstm \ --learning_rate 0.0001 \ --model_output data/irt/checkpoints/irt \ --train_listfile data/irt/list.train \ --eval_listfile data/irt/list.eval \ --max_iterations 10000 \ --target_error_rate 0.01 Loaded file data/urd/irt.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Code range changed from 129 to 129! Num (Extended) outputs,weights in Series: 1,48,0,1:1, 0 Num (Extended) outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys64:64, 20736 Lfx96:96, 61824 Lrx96:96, 74112 Lfx384:384, 738816 Fc129:129, 49665 Total weights = 945313 Previous null char=2 mapped to 128 **Continuing from data/urd/irt.lstm Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_1.png نقش فریادی ہے کس کی شوخیٔ تحریر کا Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_2.png کاغذی ہے پیرہن ہر پیکر تصویر کا Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_5.png جذبۂ بے اختیار شوق دیکھا چاہیے Load of page 0 failed! Load of images failed!! make: * [Makefile:327: data/irt/checkpoints/irt_checkpoint] Segmentation fault (core dumped)
Please help me how to proceed further. I'm stuck.
Thanks you