kako-nawao / ffconv

Process media files to match profiles using ffmpeg
0 stars 0 forks source link

Detect character encoding #6

Closed kako-nawao closed 9 years ago

kako-nawao commented 9 years ago

Extraction of subtitles fails when these are not utf-8. We'd need a way to determine the correct encoding of the stream and use the sub_charenc option with that value. Stream data does not seem to display anything useful, though.

As a last resort, a brute-force attack could work, attempting several encodings until one works.

kako-nawao commented 9 years ago

Added brute-force attack with most common encodings (iso-8859-1), it works. Need to update unit tests to consider it closed.

kako-nawao commented 9 years ago

We could make lots of improvements, but current handling is enough for now, and rather quick: since we retry with the next codec as soon as an error message is printer (we process stdout+stderr by line), time wasted is minimized.