Closed mwestphal closed 11 years ago
I guess it's more of a tesseract problem than VobSub2SRT which is just automating the different tools. I suggest you to also send a bug to the tesseract project to see what they tell u.
The characters look like traditional Chinese writing and you are using chi_sim
which is for simplified Chinese characters. Try chi_tra
(maybe you need to install it first. E.g., tesseract-ocr-chi-tra
on Debian/Ubuntu) instead.
@ruediger thx for the suggestion, but it is simplified chinese. i've tried with chi_tra and there is more rubbish.
I will open an issue in the tesseract project.
Anyway the easy work around here is to hardcode the vobsub into the video, but that's not the point.
Oh, my chinese is a bit too rusty I guess.
Please provide a link to the tesseract bug report. You can also dump images with the --dump-images
flag if the tesseract devs ask for a sample.
Look like i cannot provide the information they need, nor do the test the suggest. If anyone from vosub2srt want to take the lead with this issue, please do it.
As I said you can extract the subtitle images with --dump-images
. Pick one and do the tests using the tesseract command line program.
Hello i've been trying to use vobsub2srt to convert chinese sb to srt. Using the following command : vobsub2srt --lang zh --tesseract-lang chi_sim subtitles
however the conversion is not working well, a lot of character are not recognized correctly, even so the font used in vobsub is perfectly readable.
Here the vobsub screenshot : http://img546.imageshack.us/img546/4601/u1xu.jpg
Here the converted sub : http://img571.imageshack.us/img571/5273/iyx2.jpg
We can easily see that some characted have been simplfied. Here are the sub/idx files. http://www.2shared.com/file/ZvL3xukf/subtitles.html http://www.2shared.com/file/1u7L35fD/subtitles.html
Is this normal? is there a work around ?