h / pytesseract

Python-tesseract is an optical character recognition (OCR) tool for python
https://github.com/h/pytesseract
GNU General Public License v3.0
72 stars 2 forks source link

Finetuning Model #5

Open donjuanpond opened 2 months ago

donjuanpond commented 2 months ago

Hey! Is there any way to finetune Tesseract models using PyTesseract? The current docs for finetuning on tesseract seem very hard to use and complicated, so it'd be very nice if PyTesseract could make some training/finetuning scripts.

SouravaBehera commented 3 weeks ago

Hey @donjuanpond , did you get any way to finetuning for any specific langauge.

donjuanpond commented 3 weeks ago

I did not, I eventually gave up on PyTesseract and moved to TrOCR. You can try this tutorial: https://www.youtube.com/watch?v=KE4xEzFGSU8