Closed GoogleCodeExporter closed 9 years ago
If the bounding boxes are close enough in the English and Arabic runs, you can
try hocr-merge. It takes two or more hOCR files for the same page, and merges
them into
one, including words with the highest confidence. It is contained in misc/xhocr
in the repository:
https://bitbucket.org/jwilk/marasca-wbl
Original comment by jsb...@mimuw.edu.pl
on 5 May 2013 at 7:03
Just tried an urdu ocr in vietocr itself myself, and am happy to confirm that
vietocr does very find urdu ocr when language is selected as Arabic.
However, do use single language setting of language for Arabic as well as
English. If ara+eng is set, then both languages come as junk, whereas both
language come with 95% accuracy when single language is set.
The same results as reported in this issue in the three images were seen.
At the same time, I had earlier tried Hin+Eng (Hindi) and got pretty perfect
result with both the language. Could be something in the ltr and rtl text flow,
not sure, that is causing the problem
Thanks.
--
Rawat
Original comment by vsrawat
on 10 Nov 2013 at 12:50
I tried reproducing this with the latest code in SVN, but couldn't. I did
however stumble across another bug with ara+eng, which I reported as issue
1220. New training data is coming sometime soon, though, which is good, so with
a bit of luck that might fix it. Until then, jsbien's hocr-merge recommendation
does sound interesting.
Original comment by nick.wh...@durham.ac.uk
on 27 May 2014 at 8:39
Original comment by nick.wh...@durham.ac.uk
on 27 May 2014 at 8:39
Fixed by change 2f197cd6537b
Original comment by theraysm...@gmail.com
on 7 Oct 2014 at 4:01
Original issue reported on code.google.com by
saade_jo...@hotmail.com
on 3 May 2013 at 11:32Attachments: