yxm4109 / python-tesseract

Automatically exported from code.google.com/p/python-tesseract
0 stars 0 forks source link

api.GetHocrText() returns malformed XML #26

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Control characters are inserted into the document, and XML parsers cannot 
handle it without first trying to strip them out. This problem was reportedly 
fixed in the main tesseract SVN a few days ago, and I think producing an update 
linked with SVN will fix it.

Using Python 2.7.3 under Windows 7 X64.

P.S. Are there any instructions for building from SVN with VS 2008? I see the 
binary under downloads but there's no information as for how it was generated. 
Just libtesseract et al wrapped with swig?

Original issue reported on code.google.com by stephen....@gmail.com on 9 Aug 2012 at 2:27

GoogleCodeExporter commented 9 years ago
If u are trying to use Python 64Version, then the answer is negative. I am 
still working on how to compile tesseract-ocr into windows 64 bit version.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 3:48

GoogleCodeExporter commented 9 years ago
No; this is 32-bit python, and I have no interest in compiling/distributing 
anything exclusive to 64-bit machines. Apart from the occasional memory 
corruption from Tesseract and this issue, the package is working very well.

Original comment by stephen....@gmail.com on 10 Aug 2012 at 12:39

GoogleCodeExporter commented 9 years ago
Since the current release of tesseract is   relatively old, compiling
svn might not compatible with python tesseract all the time. Anyhow, I
will look into it and come back to u ASAP.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 2:55

GoogleCodeExporter commented 9 years ago
Below is built vs tesseract-ocr svn737 
http://python-tesseract.googlecode.com/files/python-tesseract-0.7.5.win32-py2.7.
exe

If it works, buy me a coffee pls. 

If not, pls contact me.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 7:30

GoogleCodeExporter commented 9 years ago
Well done; that seems to have fixed it. I'm more than happy to help feed your 
coffee adiction. Do you accept PayPal?

Also, if you would be willing to pass on any instructions for getting the SWIG 
portion to build properly under VS2008 (once Tesseract itself is built) I'd be 
happy to update my own copies on my development machine. Thanks again for the 
quick fix!

Steve

Original comment by stephen....@gmail.com on 10 Aug 2012 at 7:52

GoogleCodeExporter commented 9 years ago
Try and let me know whether the following procedures work for u

svn checkout http://python-tesseract.googlecode.com/svn/trunk/ python-tesseract
cd python-tesseract
python setup.py build
python setup.py install

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:48

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
https://www.paypal.com/cgi-bin/webscr?cmd=_cart&business=VD2Y4PZSK7T86&lc=HK&ite
m_name=To%20support%20the%20development%20of%20python%2dtesseract&amount=5%2e00�
�cy_code=USKD&button_subtype=products&add=1&bn=PP%2dShopCartBF%3abtn_cart_LG%2eg
if%3aNonHosted

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:51

GoogleCodeExporter commented 9 years ago
Worked like a charm. Sending you a couple cups of coffee shortly. Thanks!
Steve

Original comment by stephen....@gmail.com on 13 Aug 2012 at 12:32

GoogleCodeExporter commented 9 years ago
Thank you for your coffees

Original comment by FreeT...@gmail.com on 13 Aug 2012 at 5:31

GoogleCodeExporter commented 9 years ago

Original comment by FreeT...@gmail.com on 20 Aug 2012 at 8:47