ukdtom / SRT2UTF-8.bundle

Plex Agent, that'll convert sidecar subtitle files into UTF-8, if not
126 stars 14 forks source link

Wrongly converted diacritics for SR/HR subs #30

Closed vBm closed 8 years ago

vBm commented 8 years ago

Majority of subtitles that are downloaded from podnapisi.net are in ANSI format with special chars that represent diacritics for Serbian/Croatian, when your plugin converts them to UTF-8 they get heavily broken because they're not properly detected.

ANSI Windows-1250 UTF-8
ð đ đ
æ ć ć
è č č

In other words ANSI -> UTF-8 is converting letters that are in different unicode blocks entirely.

Correct would be ANSI -> W-1250 -> UTF-8, that way we get desired letters. (manually doing so via notepad++ gives proper results)

So I'm not sure if problem is in your bundle or it's an upstream problem of chared

ukdtom commented 8 years ago

Sadly clueless here, since I know lots of your country men are using this?

But I'll need the following from you....

A downloaded untouched srt, a broken one and a fixed one. Since I don't know your language, all I can do is compare ;-)

vBm commented 8 years ago

Unfortunately i can't seem to be able to reproduce the issue at hand. Maybe this was an upstream problem (podnapisi.net) where .srt were mixed or so.

In past 3 weeks i've downloaded more than 500 subs and none of 'em were broken so i'd say it's pretty safe to close this ticket as "works for me" :D Sorry for incomplete initial report, if i had samples it would be a different story.