spaam / svtplay-dl

Small command-line program to download videos from some streaming sites.
https://svtplay-dl.se
MIT License
718 stars 120 forks source link

TV4 - Use non segmented subtitles? #1133

Open Sopor opened 5 years ago

Sopor commented 5 years ago

Is seems that there is another way to get non segmented subtitles from TV4. It was @SYSophie that wrote about it :)

Sample manifest of segmented subtitle: https://lbs-usp-hls-vod.cmore.se/vod/41624/tme3g3tbnki(11990130_ISMUSP).ism/tme3g3tbnki(11990130_ISMUSP)-textstream_swe=3000.m3u8

Sample segment: https://lbs-usp-hls-vod.cmore.se/vod/41624/tme3g3tbnki(11990130_ISMUSP).ism/tme3g3tbnki(11990130_ISMUSP)-textstream_swe=3000-1.webvtt

Sample of complete webvtt (NOT segmented) https://lbs-usp-hls-vod.cmore.se/vod/41624/tme3g3tbnki(11990130_ISMUSP).ism/tme3g3tbnki(11990130_ISMUSP)-textstream_swe=3000.webvtt

I don't know if you want to use this instead of the segmented ones?

spaam commented 5 years ago

where is that file? i cant find it :)

SYSophie commented 5 years ago

@Sopor this has been mentioned before i.e #1050 , these subs seems to be stitched together from the segmented ones so they still need some post-processing of merging double or more occurrences of the same text and then merge the time cues. As far as I know this is available on all subtitles but the URL for them is not displayed in any API or manifest.

Sopor commented 5 years ago

I used the switch -S -g to get the url and then i remove the -1 from the subtitle url and i can download a webvtt that is not segmented

svtplay-dl.exe -S -g https://www.tv4play.se/program/f%C3%A5ngarna-p%C3%A5-fortet/11973281
https://lbs-usp-hls-vod.cmore.se/vod/3369c/hdhuj4pjido(11973281_ISMUSP).ism/hdhuj4pjido(11973281_ISMUSP)-textstream_swe=3000-1.webvtt
https://lbs-usp-hls-vod.cmore.se/vod/3369c/hdhuj4pjido(11973281_ISMUSP).ism/hdhuj4pjido(11973281_ISMUSP)-video=5708743.m3u8

I can then download the subtitle

wget "https://lbs-usp-hls-vod.cmore.se/vod/3369c/hdhuj4pjido(11973281_ISMUSP).ism/hdhuj4pjido(11973281_ISMUSP)-textstream_swe=3000.webvtt"

2019-09-06 23:36:09 (3.59 MB/s) - ‘hdhuj4pjido(11973281_ISMUSP)-textstream_swe=3000.webvtt’ saved [101830]
WEBVTT
X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

00:00:10.920 --> 00:00:12.000
Välkomna till Fort Boyard.
En tigerorm sa till mig en gång:
SYSophie commented 5 years ago

@Sopor well the URL to these non segmented WEBVTT files are still not listed in any API or Manifest as the WEBVTT files for the segmented subtitles are, just as I previously stated.

spaam commented 5 years ago

where did you find that url? i dont really like to use some magic to find a url that might stop working.

Sopor commented 5 years ago

That URL has been working at least since the beginning of this year (#1150), so it is not something new. I don't know why it is there and how someone found it but it is working.

Let the segmented code be there and set another switch for segmented subtitles? If the non-segmented subtitle get broken we can easy go back to the segmented again by using another switch.

SYSophie commented 5 years ago

@spaam it was @dksx that mentioned this first in #1050, seems to have been an lucky guess that he tried to get the vtt without the segment part. @Sopor brought this up after I mentioned this discovery in some other repositories like Retrospect, they have been struggling implementing the segmented WebVTT so I gave them this and they were able to implement this.

I wanna mention that I'm also concerned about for how long this is gonna be available or in fact it's available on all videos. However this gives some flexibility in case the segmented subs are struggling, it also makes less connections and hopefully yields a faster download of the subtitle (it still needs to be post processed since it carries cues that are overlapping and needs to be merged into to one single cue).

But sure I don't like to rely on some magic solution either but at least consider it from the perspective in case the segmented WebbVTT would be struggling, then maybe this would be something that could work or perhaps not for that matter.