spaam / svtplay-dl

Small command-line program to download videos from some streaming sites.
https://svtplay-dl.se
MIT License
713 stars 118 forks source link

SVTPlay - Strange chars in subtitles #1072

Closed Sopor closed 5 years ago

Sopor commented 5 years ago

It seems that the Swedish chars sometimes will be wrong so it would be nice if it could have some kind of check and convert it to the right chars?

svtplay-dl.exe -S --force-subtitle https://www.svtplay.se/video/22014874/bonusfamiljen/bonusfamiljen-sasong-3-avsnitt-10`
1
00:00:05,120 --> 00:00:10,040
Vad säger ni om biff Rydberg
i morgon?

2
00:00:10,200 --> 00:00:16,280
-Ja, det vore gott.
-Jag menar till familjemiddagen.

3
00:00:17,640 --> 00:00:23,440
Den har jag förträngt. Vi har AW
på jobbet, men det ställer jag in.

4
00:00:23,600 --> 00:00:30,160
Ja, det vore konstigt om våra
föräldrar kommer och du inte är här.
Sopor commented 5 years ago

It seems to be all new subtitles on SVT and all kinds of chars.

Sopor commented 5 years ago

It seems that some subtitles are in wsrt and others in vtt. Could this be the reason for this issue?

Groggan commented 5 years ago

Sopor, open the file in Notepad and go to File > Save as and in the Save as dialog, change Encoding from UTF-8 to ANSI. Click on Save and click Yes in the Confirm Save As dialog that pops up to overwrite the existing file.

Now subtitles should look ok. The downloaded files are encoded in UTF-8 and as you have noticied, some work while other dont. Saving the files (.srt) with the ANSI encoding solves this issue, for software players and for hardware players.

Sopor commented 5 years ago

@Groggan Sorry, but that doesn't work for me. I have already tried that and even more. Until this is fixed i use youtube-dl to download the subtitles and i convert them to srt with Subtitle Edit :)

martinlindhe commented 5 years ago

@Groggan the downloaded sub talked about in this bug report is mojibake, not actual utf8.

martinlindhe commented 5 years ago

To clarify, this is a real issue which just recently started occurring with some subtitles from svtplay.se. It happens on macOS and Linux aswell and it is the result of incorrectly handled text encoding.

myowncloudservice commented 5 years ago

I download this video again from svtplay.se and now the characters will be correct so I think the issue was at www.svtplay.se and not svtplay-dl.

https://www.svtplay.se/video/20244542/hemma-igen/hemma-igen-sasong-2-inga-fler-hemligheter?start=auto

martinlindhe commented 5 years ago

@myowncloudservice the video you link to was published before the errors started.

Here are some videos that do show garbled srt:

may 6: https://www.svtplay.se/video/22112450/vetenskapens-varld/vetenskapens-varld-farliga-forare?start=auto&tab=2019 may 13: https://www.svtplay.se/video/22200145/vetenskapens-varld/vetenskapens-varld-eld-i-berget?start=auto&tab=2019

zmirren commented 5 years ago

I just dowloaded the latest version of svtplay-dl, and now I experience the same issue as mentioned above with swedish characters in subtitles. Earlier, the subtitles were encoded correctly (hardcoded to the video), but now (for some videos) a separate .srt-file with erratic encoding is created.

I hope this issue is taken seriously and will be fixed soon.

Sopor commented 5 years ago

You can use youtube-dl as a work-a-round until it is fixed.

Here is an example:

youtube-dl.exe --write-sub -o "%(title)s-%(id)s.%(ext)s" --sub-lang sv --ignore-config --skip-download "https://www.svtplay.se/video/22438846/raggarjavlar"

If you want srt instead of vtt, use Subtitle Edit to convert it or an online service.

zmirren commented 5 years ago

You can use youtube-dl as a work-a-round until it is fixed.

Here is an example:

youtube-dl.exe --write-sub -o "%(title)s-%(id)s.%(ext)s" --sub-lang sv --ignore-config --skip-download "https://www.svtplay.se/video/22438846/raggarjavlar"

If you want srt instead of vtt, use Subtitle Edit to convert it or an online service.

Great, thanks! That seems to do the trick for now.

Out of curiousity, what is the purpose of converting the vtt-file afterwards? I simply renamned the file to .srt, and that works flawlessly, at least when playing it in VLC-player (haven't tried other mediaplayers yet...).

martinlindhe commented 5 years ago

VTT and SRT are different, but similar subtitle text formats. SRT is more universally supported, but both works with VLC as you discovered.

edu-zamora commented 5 years ago

This issue seems to be fixed in 2.2.

spaam commented 5 years ago

yep this is fixed in 2.2. thanks for the report <3