Open GoogleCodeExporter opened 9 years ago
I did a typo in the example.. the actual file does actually have the correct
line
1 d 2 c t 0
Original comment by baraba...@gmail.com
on 31 Aug 2011 at 6:33
bump o.o
Original comment by kopanda0...@gmail.com
on 23 Jun 2012 at 1:45
Issue 719 has been merged into this issue.
Original comment by zde...@gmail.com
on 21 Jul 2012 at 3:49
Have the same problem. With the addition: I am sure that I have included the
unicharambigs correctly, because replacements work when set to mandatory (type
1) - but that's not the desired solution, of course.
Original comment by martin.s...@illusion-factory.de
on 23 Dec 2012 at 9:24
I am still having the same issues as before. Tesseract should compare the
output to the dawg files to get rid of extra spacing in the middle of words and
put a space when two words run together. It should look for optional
substitutions as well. Dawg files do not have desired affect, please fix.
Original comment by mattt...@gmail.com
on 3 Jan 2013 at 7:54
I am also having same issue.
Have any one fixed this issue...
Original comment by dharmend...@gmail.com
on 5 Mar 2013 at 11:08
Same here, added couple of rules to eng.unicharambigs and made sure it's
combined correctly. Still works only if I force the substitution by setting the
last column to "1".
Original comment by remon.sh...@gmail.com
on 22 Oct 2013 at 11:34
Ideally, the other optional toggles only function best when there are
supporting files such as the dictionary, freuently used words, etc etc.
I had to use the 1 because I didn't have these files prepared.
Original comment by boydtw...@gmail.com
on 22 Oct 2013 at 9:01
There seems to be two issues at hand here.
First there's the issue of "type 0" (optional) ambigs, which seem to be ignored.
But as it has been pointed out, these are likely working as intended, and are
simply not being selected because they're deemed unlikely.
Second, there currently appears to be a bug involving multi-char ambigs.
I'll leave out the messier details, but the gist is that such rules will
silently fail parsing and therefore be ignored at runtime.
I've created a patch that should take care of this, and makes both types of
multi-char ambig rule successfully parse/load.
If you want to check and verify this bug and it's patch, try run tesseract with
a config file including "ambigs_debug_level 3".
You should see which lines load and which don't - the latter with a message
along the line of "Illegal unichar ...".
Original comment by clements...@gmail.com
on 4 Jan 2014 at 3:16
Attachments:
is this patch included in the latest source in git?
Original comment by shreeshrii
on 16 Oct 2014 at 2:48
Original issue reported on code.google.com by
baraba...@gmail.com
on 30 Aug 2011 at 9:05Attachments: