zer09 / tesseractdotnet

Automatically exported from code.google.com/p/tesseractdotnet
0 stars 0 forks source link

Return Unicode string #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
Expect to receive Unicode string; got UTF-8 string instead.

What version of the product are you using? On what operating system?
r42, Win7 64-bit

Please provide any additional information below.

Modify TesseractProcessor::Process(TessBaseAPI* api, Pix* pix) method in 
TesseractRecognizer.cpp as follows:

Old:
String* result = new String(text);

New:
String* result = new String(text, 0, strlen(text), Encoding::UTF8);

Original issue reported on code.google.com by nguyen...@gmail.com on 3 Jul 2011 at 2:13

GoogleCodeExporter commented 9 years ago
Had the same isue with Croatian(ŠĐŽČĆ) UTF8 chars, and noticed that the 
char* is not decoded correctly - came to the similar solution.
I think this change should be commited to SVN repos...

Original comment by Darko.Pr...@gmail.com on 26 Sep 2011 at 10:27