seeebek / EliteOCR

OCR tool for market screenshots in Elite: Dangerous
Other
67 stars 23 forks source link

python-tesseract vs pytesseract? #30

Closed demonbane closed 9 years ago

demonbane commented 9 years ago

While I was working on getting this running on Mac I remembered reading about pytesseract, but it looks like it's not the same thing as python-tesseract. Just wondering if you looked at both and selected python-tesseract for a particular reason, or if it was just the first thing you found.

I was interested in seeing if it would be feasible to port EliteOCR over to pytesseract because it seems to be better maintained overall, but I want to make sure I'm not wasting my time because it's lacking a feature that you need or anything like that.

seeebek commented 9 years ago

I think I it was what I found first and it worked directly so I took it.

Do not put too much work into it. I'm switching completely to MPL from OpenCV. Basically I will drop Tesseract, Scipy, Scikit-learn and Beautifulsoup4 altogether. I have already first tests done and the increase in performance is great. Additionally it will reduce complexity and file size for standalone files. I just need to work on some confidence testing.

Expect first test version tomorrow in dev branch (I will not release it as a new version until I can test it sufficiently)

CapCap commented 9 years ago

way to go @seeebek !!!

seeebek commented 9 years ago

I have a working example of the new engine in the dev branch. It is still very new and some functionality of EliteOCR is missing for now (e.g. only English supported, no levenshtein corrections). As promissed I dropped the mentioned dependencies. I'm still working on dropping more, once the OCR engine is accurate and reliable enough.

demonbane commented 9 years ago

I just tested this locally and wow is it fast! I'm really excited to play with it more!

It looks like you didn't rebase the dev branch after the recent changes in master, so I'll send a follow-up PR that includes all of those changes on the dev branch. (with all of the changes from master pulled in it works "out of the box" on my Mac which is great news!)

demonbane commented 9 years ago

I'm having trouble getting the rebase done in a PR, so I'll just give you the steps to fix from your end:

git checkout dev
git rebase master
# you'll get an error
git rm ocrmethods.py nn_training.py
git rebase --continue
git push
seeebek commented 9 years ago

EliteOCR is officially tesseract free as of version 0.6