Closed chapmeister closed 13 years ago
This is a bug in tesseract and not in vobsub2srt. Which version of tesseract are you using? Did you try tesseract 3.0?
Apparently I was using 2.04-2 which is the latest version my ubuntu 10.04 could find. I manually tried to install 3.00 and it all seemed to work, but didn't, as it kept reporting as v2.04-2. I tried uninstalling v2, and reinstalling v3 then reinstalling vobsub2srt, but now nothing works with a:
Unable to load unicharset file /usr/share/tesseract-ocr/tessdata/eng.unicharset
Being fairly non-technical, I guess I'm now out of luck. Pity. :-/
Do you have the language files installed? Maybe in a different directory?
Whilst running the program I got:
vobsub2srt: unicharset.cpp:76: const UNICHAR_ID UNICHARSET::unichar_to_id(const char*, int) const: Assertion `ids.contains(unichar_repr, length)' failed. Aborted
This was on line 1,313 of a 1,620 line .srt file. There didn't appear to be anything different about the next image to be ocr'ed?