ruediger / VobSub2SRT

Converts VobSub subtitles (.idx/.srt format) into .srt subtitles.
GNU General Public License v3.0
293 stars 65 forks source link

TESSERACT_DATA_PATH and tesseract 3.03_rc1 #45

Closed netfab closed 9 years ago

netfab commented 9 years ago

Hi,

My system is gentoo. If I use the ebuild from packaging/, vobsub2srt is working fine with tesseract-3.02. But, if I upgrade tesseract to 3.03_rc1, vobsub2srt fails like this :

$ vobsub2srt 01 Error opening data file /usr/share/tesseract-ocr/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'fra' Tesseract couldn't load any languages! Failed to initialize tesseract (OCR).

The only way I found to make it work with tesseract 3.03_rc1 is to rebuild vobsub2srt to force TESSERACT_DATA_PATH at configure phase, like following in the ebuild :

src_configure() {
      local mycmakeargs=(
              -DTESSERACT_DATA_PATH="/usr/share"
      )
      cmake-utils_src_configure
}

So my question is : why do I need to force the TESSERACT_DATA_PATH value with tesseract 3.03_rc1 but not with 3.02 ? Is this normal ?

Thanks.

ruediger commented 9 years ago

What's the location of the tessdata directory? /usr/share/tessdata? Is that something which is set by Gentoo or a change in tesseract?

Right now VobSub2SRT does not try to detect the correct TESSERACT_DATA_PATH but simply assumes /usr/share/tesseract-ocr/tessdata. That was never optimal and should be detected in the CMake script.

netfab commented 9 years ago

What's the location of the tessdata directory? /usr/share/tessdata?

Yes. In both versions. Currently it is not set by Gentoo.

But it seems that the tessdata directory behavior has changed between 3.02 and 3.03, please see : https://code.google.com/p/tesseract-ocr/issues/detail?id=938 https://code.google.com/p/tesseract-ocr/source/detail?r=e66d43390782f056b9be6e4aee4bf35c214a2f2d

ruediger commented 9 years ago

Does it work if you compile it without specifying TESSERACT_DATA_PATH and instead call it with

$ vobsub2srt --tesseract-data "" 01
netfab commented 9 years ago

No, same error, except that the path is different :

Error opening data file ./tessdata/fra.traineddata

ruediger commented 9 years ago

sigh I thought "" would set it to the default (tesseract_->datadir?). I guess I'll have to look into it. Since it's an rc, you have a workaround, and I'm currently short on time this has to wait a bit though (unless you come up with a backward compatible patch :))