justaddcoffee / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Tesseract does not recognize spacing for Bengali language #1468

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Make a traindata for Bengali language
2.Generate output
3.Output does not contain any space among characters

What is the expected output? What do you see instead?
There should be word spacing and sentence spacing, I made an English traindata 
as well and it detects spacing, but for Bengali, it does not work.I am using 
3.02.02 windows installer.

Please use labels and text to provide additional information.

These are few characters of Bengali
আ মা দে র দে শে র না ম বা লা দে শ

some text in the input image file may look like this
আমাদের দেশের নাম বালাদেশ

But the output generated is like this
আমাদেরদেশেরনামবালাদেশ

I need kind help as it may not be a common problem.I do find many solutions 
online.

Original issue reported on code.google.com by m.tawfi...@gmail.com on 30 Apr 2015 at 4:00

GoogleCodeExporter commented 9 years ago
Sorry, I meant I do not find many solutions online.

Original comment by m.tawfi...@gmail.com on 6 May 2015 at 12:13