minni / pytesseracttrainer

Automatically exported from code.google.com/p/pytesseracttrainer
GNU General Public License v3.0
0 stars 0 forks source link

Unable to use BarahaIME for Indic #7

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.tried with sample kan-01.tif(utf-8) with its box 
2.Unable to view for editing purpose vide screenshot attached.
3.does not support barahaIME (www.baraha.com)?

What is the expected output? What do you see instead?
 should be able to view the kannada fonts/script(UTF-8) for editing purpose. 

What version of the product are you using? On what operating system?
 latest py version 1.02 / WinXP with sp3.

Please provide any additional information below.
I am sure that py version 1.02 is suitable for  Indic (Indian langaguages)
(utf-8)for editing purpose. Unable to try with WindowsIME - since I could not 
locate in WinxP for windoesIME tool. Help/solution is requested. 

Original issue reported on code.google.com by withbles...@gmail.com on 7 Sep 2010 at 3:32

Attachments:

GoogleCodeExporter commented 8 years ago
 sample replace.py which supports utf8 also attached.

Original comment by withbles...@gmail.com on 3 Oct 2010 at 3:09

Attachments:

GoogleCodeExporter commented 8 years ago
Hello Zdenko,

What does it take to add a new language script to the pytesseract trainer?
I don't see it as a utf-8 issue, as utf-8 is fully supported at both places 
(trainer as well as tesseract-ocr).  Why is the Kannada fonts are being 
displayed? If you could provide some pointers, I would be able to fix it.

Thanks,
Senthil

Original comment by orsenthil@gmail.com on 5 Oct 2010 at 2:51

GoogleCodeExporter commented 8 years ago
I am not sure where exactly is problem (GTK? python? windows? 
pytesseracttrainer?). According one post on tesseract forum 
(http://groups.google.com/group/tesseract-ocr/msg/2846c4309d864c68?hl=en) 
pytesseracttrainer works with japanese, so I expect that problem is not in 
pytesseracttrainer ;-)

But I have not possibility to test Kanada or WindowsIME... So if somebody can 
test it (identify problem) and improved code I would be glad.

My intention is to create font selector + "configuration system" - maybe it 
helps in some extent. Now you can change font manually in script (line 53): 
BASE_FONT = 'Serif' - as far as I tested it: choosing something else (e.g. 
'Arial') than 'Sans', 'Serif', 'monospace' caused error message.

Original comment by zde...@gmail.com on 5 Oct 2010 at 6:42

GoogleCodeExporter commented 8 years ago
Regarding WindowsIME = kindly quote  the website from where I can download 
windowsIME.

regarding BASE_FONT = Tested with Kannada fonts viz. Kedage, Mallige, Tunga it 
will display script clearly but
unable to  type. Also tested with BRH Kannada font - does not display kannada 
script nor unable to type, if selected Kannada - whereas if selected English, 
easily can be typed using BarahaIME. This proves that BarahaIME does not 
support py program in full?

Interesting point is tested in ubuntu 10.04 wherein it works fine with itrans 
keyboard.why WinXP gives trouble - which I could not understand.

-sriranga(78yrsold)

Original comment by withbles...@gmail.com on 5 Oct 2010 at 7:38

GoogleCodeExporter commented 8 years ago
Even tested with google transliteration 
IME(http://www.google.com/transliterate/)
 does not work on py program in WinxP OS.This is brought to your kind notice.

Original comment by withbles...@gmail.com on 30 Oct 2010 at 12:42

GoogleCodeExporter commented 8 years ago
You should use appropriate font - change line 53 (BASE_FONT = 'Serif') in 
pyTesseractTrainer-1.02.py to font that support your script.

E.g. when I changed it to BASE_FONT = 'unifont 12' I got attached result, that 
looks reasonable to me. (of course unifont must be installed on you system).

Original comment by zde...@gmail.com on 4 Jan 2011 at 10:25

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by zde...@gmail.com on 4 Jan 2011 at 10:35

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
ZDE@ 
Extremely thankful to you for the solution. I changed seriff to cheluvi font in 
place of seriff as suggested by you. IT displayed the Kannada fonts but unable 
to type in the box. cheluvi-n-ttf also attached for research purpose and for 
benefit of other users 

Original comment by withbles...@gmail.com on 4 Jan 2011 at 12:15

Attachments:

GoogleCodeExporter commented 8 years ago
Issue 8 has been merged into this issue.

Original comment by zde...@gmail.com on 5 Jan 2011 at 7:51

GoogleCodeExporter commented 8 years ago
is it can be used in Ubuntu11.01 O.S

Original comment by mamata2...@gmail.com on 24 Apr 2013 at 5:23