openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
930 stars 152 forks source link

Unnecessary file IO #19

Closed ventsyv closed 8 years ago

ventsyv commented 10 years ago

The way the image_to_string functions are currently implemented, the output of the engine is written to file, which is then read in and returned to the user. Both Cuneiform and Tesseract now support sending the output to stdout thus eliminating the need for the 2 extra file IO operations. I'll attempt implementing this - hopefully it will result in speeding things up a bit.

jflesch commented 10 years ago

You're right, it would be much cleaner. However, can you check first in which version of Cuneiform and Tesseract these options became available please ? Because if it's not supported by their versions in Debian stable, I think it would be best to not integrate this work in the branch 'master' immediately (we can keep it in a separate branch until then).

Thanks in advance,

ventsyv commented 10 years ago

My initial plan was to check the version number - if the version does not support stdout, just keep doing what you are doing now. I think I now have a better idea. I'll let you know once I implement it, maybe you can run it on your side and verify it's working before merging the branch in master?

jflesch commented 10 years ago

I will verify it, don't worry :)

jflesch commented 9 years ago

30 should fix this issue as well for Tesseract

jflesch commented 8 years ago

30 has been implemented. It fixes this issue.