Open GoogleCodeExporter opened 9 years ago
Hi! I am also ready for collaborating on making tesseract compatible with
Persian (fa) language. Also please note that other than these characters, ۴ ۵
۶ (4 5 6 in Persian digits) is a bit different from their Arabic counterparts
but Arabic-indic digits are also acceptable in Persian script.
Original comment by ebra...@byagowi.com
on 16 Oct 2012 at 11:34
Hi guys,
could you also throw this in, in addition to the persian letters? ڭ .
Anything new happening with Arabic OCR?
Original comment by j...@christianmissiontrips.org
on 1 Nov 2012 at 9:25
[deleted comment]
we started making Persian data training on
https://github.com/reza1615/PersianOcr
Original comment by reza.mos...@gmail.com
on 5 Nov 2012 at 7:42
Here is latest tries of reza's works with Tesseract 3.02.2 that I put on
github:
https://github.com/reza1615/PersianOcr/tree/master/Sample%20Test%20of%20Latest%2
0Version
Looks promising but we think there is some hidden hints and secrets on training
tesseract on Arabic script. I believe Google's documentations are very poor
about notes that we must consider for training and this is not cool for an
open-source project. For example it is very very helpful if you publish Arabic
source files that you used for training tesseract in cube method that you used
for Arabic.
Original comment by ebra...@byagowi.com
on 5 Nov 2012 at 8:37
Hi,
If you gonna make farsi for us , in addition to letter differences between
farsi and arabic that my friends said in above comments , I want to add another
consideration:
in farsi we doesn`t have letter:ي
instead we have: ی
Thanks
Original comment by abidiash...@gmail.com
on 21 Dec 2012 at 8:28
I'm ready to collaborate in this project too.
Total numbers of Persian speakers are more than 110 million.
Also there isn't any other OCR for it.
Original comment by intelsat...@gmail.com
on 6 Jan 2013 at 1:01
[deleted comment]
[deleted comment]
reza man mikham train konam tesseract ro mituni maraheleshu behem begi
Original comment by amir...@gmail.com
on 25 Sep 2013 at 10:20
Hallo guys,
I'm trying to train Tesseract for Kurdish, this is good too for the Persian,
Kurdish has some more other letters, but the way of writing is the same as
Arabic or Farsi. The problem I'm getting is that the final OCR result is not
from right to left, but from left to right, which means that u can't read the
text, but the letters r correct. I use qt-box-editor to edit the box, then I
use Serak tesseract Trainer V0.4 to train the OCR, after all I put the
Traineddata file in the Tesseract dir., every thing goes well except the
missing Arabic mechanism of writing from right to left.
Does any body know this peoblem?
You could see the traineddata file I generated as an attachment.
Thanks alot
Original comment by karo0...@gmail.com
on 18 Oct 2013 at 7:27
Attachments:
Hello,
It seems to train Arabic and Farsi languages with good precision you need train
cube engine of tesseract. Do you know how cube engine could be trained ? Main
programmer of cube engine is Ahmad Abdulkader, now memeber of facebook company!
Original comment by vahid.ke...@gmail.com
on 5 Jul 2014 at 11:34
@ Vahid. I tried to use cube engine but it doesn't have any help or manual so I
couldn't train perfectly. I sent many emails to ocr developers but nobody
answered!
Original comment by reza.mos...@gmail.com
on 6 Jul 2014 at 8:19
سلام
افراد دیگری (بجز برنامه نویسانش) سعی کرده
اند که کمی از این cube سر در بیاورند ، که
نتایج آنرا اینجا نوشته اند:
https://code.google.com/p/tesseract-ocr-extradocs/
البته من که چیزی سر درنیاوردم.
Original comment by abidiash...@gmail.com
on 6 Jul 2014 at 8:43
سلام مجدد
لینک داده شده را مطالعه کردم متاسفانه فقط
معرفی کردند و روش ساخت را نگفتهاند در
نتیجه قابل استفاده نیست و بسیاری از جاها
حتی معرفی دقیق هم انجام ندادهاند
در یکی از لینکها مطرح کرده که این روش به
دلیل متنباز نبودن از کور برنامه حذف
شدهاست
Original comment by reza.mos...@gmail.com
on 6 Jul 2014 at 9:19
[deleted comment]
Ability to train the tesseract recognizer (but not cube) on several
Arabic-based languages will be added to 3.04, and this problem may receive real
attention for 3.05.
Original comment by theraysm...@gmail.com
on 4 Nov 2014 at 7:03
does anybody know when tesseract 3.04 comes ? indeed i cloned reza's project
and make training. Then i put per.traineddata to tessdata. But it didn't
worked. does any body send me tested copy of per.traineddata ?
thanks in advance.
Original comment by e.velib...@gmail.com
on 3 Feb 2015 at 2:44
Original issue reported on code.google.com by
reza.mos...@gmail.com
on 16 Oct 2012 at 11:10