yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
86.98k stars 6.78k forks source link

All downloads from SBS broken #6543

Closed bnw42 closed 1 year ago

bnw42 commented 1 year ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Australia

Provide a description that is worded well enough to be understood

All attempts to download videos from the SBS Ondemand website are now failing with similar errors. A bug report has been posted to the youtube-dl site and the conclusion there was that SBS has "made updates to their streaming platform video obfuscation."

Downloading with yt-dlp does work using the master manifest obtained using a browser addon video stream detector whilst playing the video in a browser.

Note SBS is geoblocked to Australia.

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

>yt-dlp -v http://www.sbs.com.au/ondemand/video/2174483011555
[debug] Command-line config: ['-v', 'http://www.sbs.com.au/ondemand/video/2174483011555']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error utf-8 (No VT), screen cp1252 (No VT)
[debug] yt-dlp version stable@2023.03.04 [392389b7d] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-7-6.1.7601-SP1 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg git-2020-06-10-9dfb19b, ffprobe git-2020-06-10-9dfb19b
[debug] Optional libraries: Cryptodome-3.17, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1786 extractors
WARNING: [ThePlatform] Failed to download m3u8 information: HTTP Error 403: Forbidden
ERROR: [ThePlatform] lhLGIWudF7pj: No video formats found!; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
Traceback (most recent call last):
  File "yt_dlp\YoutubeDL.py", line 1518, in wrapper
  File "yt_dlp\YoutubeDL.py", line 1615, in __extract_info
  File "yt_dlp\YoutubeDL.py", line 1727, in process_ie_result
  File "yt_dlp\YoutubeDL.py", line 1674, in process_ie_result
  File "yt_dlp\YoutubeDL.py", line 2615, in process_video_result
  File "yt_dlp\YoutubeDL.py", line 1046, in raise_no_formats
yt_dlp.utils.ExtractorError: [ThePlatform] lhLGIWudF7pj: No video formats found!; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version us
pukkandan commented 1 year ago

A bug report has been posted to the youtube-dl site

link?

bnw42 commented 1 year ago

https://github.com/ytdl-org/youtube-dl/issues/31841

gamer191 commented 1 year ago

I can reproduce. Am I misinterpreting something, or is it strange that youtubedl_smuggle is in the url it tries to extract?

yt-dlp https://www.sbs.com.au/ondemand/watch/1823195203548 --verbose
[debug] Command-line config: ['https://www.sbs.com.au/ondemand/watch/1823195203548', '--verbose']
[debug] User config "C:\Users\jaybu\AppData\Roaming\yt-dlp\config.txt": ['--ffmpeg-location', 'C:\\Users\\jaybu\\ffmpeg\\bin', '-P', 'C:\\Users\\jaybu\\youtube.dl']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.03.04 [392389b7d] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.22621-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg N-106498-g854615adf2-20220405 (setts), ffprobe N-106498-g854615adf2-20220405, phantomjs 2.1.1
[debug] Optional libraries: Cryptodome-3.17, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1786 extractors
[SBS] Extracting URL: https://www.sbs.com.au/ondemand/watch/1823195203548
[SBS] 1823195203548: Downloading JSON metadata
[ThePlatform] Extracting URL: http://link.theplatform.com/s/Bgtm9B/uOlBz65O39ey?feed=Video%20-%20Single&mbr=true&manifest=m3u&ord=4446663&policy=11929623&dfptag=sz%3D530x298%26iu%3D%2F4117%2Fvideo.entertainment.sbs.com.au%2Fsec30htmlweb%26ciu_szs%26impl%3Ds%26gdfp_req%3D1%26env%3Dvp%26output%3Dxml_vast2%26unviewed_position_start%3D1%26url%3Dwww.sbs.com.au%26description_url%3DSBS%26cust_params%3Dtype%253Dpreroll%26ad_rule%3D0%26cmsid%3D531%26nofb%3D1%26url%3Dhttp%253A%252F%252Fwww.sbs.com.au%252Fondemand%252Fvideo%252Fsingle%252F1823195203548%26description_url%3DSBS%26correlator%3D--ORD--%26vid%3D1823195203548&dfpmidtag=sz%3D530x298%26iu%3D%2F4117%2Fvideo.entertainment.sbs.com.au%2Fsec30midrollhtmlweb%26ciu_szs%26impl%3Ds%26gdfp_req%3D1%26env%3Dvp%26output%3Dxml_vast2%26unviewed_position_start%3D1%26url%3Dwww.sbs.com.au%26description_url%3DSBS%26cust_params%3Dtype%253Dmidroll%26ad_rule%3D0%26cmsid%3D531%26nofb%3D1%26url%3Dhttp%253A%252F%252Fwww.sbs.com.au%252Fondemand%252Fvideo%252Fsingle%252F1823195203548%26description_url%3DSBS%26correlator%3D--ORD--%26vid%3D1823195203548#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D
[ThePlatform] uOlBz65O39ey: Downloading SMIL data
[ThePlatform] uOlBz65O39ey: Downloading MPD manifest
WARNING: [ThePlatform] Failed to download MPD manifest: HTTP Error 403: Forbidden
[ThePlatform] uOlBz65O39ey: Downloading JSON metadata
ERROR: [ThePlatform] uOlBz65O39ey: No video formats found!; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
Traceback (most recent call last):
  File "yt_dlp\YoutubeDL.py", line 1518, in wrapper
  File "yt_dlp\YoutubeDL.py", line 1615, in __extract_info
  File "yt_dlp\YoutubeDL.py", line 1727, in process_ie_result
  File "yt_dlp\YoutubeDL.py", line 1674, in process_ie_result
  File "yt_dlp\YoutubeDL.py", line 2615, in process_video_result
  File "yt_dlp\YoutubeDL.py", line 1046, in raise_no_formats
yt_dlp.utils.ExtractorError: [ThePlatform] uOlBz65O39ey: No video formats found!; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
gamer191 commented 1 year ago

Workaround: use the android HLS

diff --git a/yt_dlp/extractor/sbs.py b/yt_dlp/extractor/sbs.py
index 45320339d..eb4b918f5 100644
--- a/yt_dlp/extractor/sbs.py
+++ b/yt_dlp/extractor/sbs.py
@@ -91,12 +91,11 @@ def _real_extract(self, url):
             raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)

         urls = player_params['releaseUrls']
-        theplatform_url = (urls.get('progressive') or urls.get('html')
+        theplatform_url = (urls.get('progressive') or urls.get('htmlandroid')
                            or urls.get('standard') or player_params['relatedItemsURL'])

         return {
             '_type': 'url_transparent',
-            'ie_key': 'ThePlatform',
             'id': video_id,
             'url': smuggle_url(self._proto_relative_url(theplatform_url), {'force_smil_url': True}),
             'is_live': player_params.get('streamType') == 'live',

I don't expect this to be committed, since the Android HLS could be lower quality (I haven't verified this). As such, I didn't put any effort into this code. If my workaround is deemed acceptable, feel free to ping me (on Discord), and I will happily make a proper PR

bnw42 commented 1 year ago

gamer191 - I don't know where "smuggle" came from but "youtubedl" appears to have come from your user configuration file, viz:

[debug] User config "C:\Users\jaybu\AppData\Roaming\yt-dlp\config.txt": ['--ffmpeg-location', 'C:\Users\jaybu\ffmpeg\bin', '-P', 'C:\Users\jaybu\youtube.dl']

I can't check ATM (at work behind a heavy duty corp firewall), but when I played something last night in a browser (after logging in) and then using a video stream detector, I was able to get the master manifest which yt-dlp was able to happily download. From memory it was via akamai. So the issue isn't actually downloading SBS videos, it is in converting the SBS url for the video into the actual URL, hence the comment on the youtube-dl forum, as in the OP.

Whether related or not, since about 10-14 days ago there has been a mismatch between the reported filesize, data & bit rates for SBS videos and what actually downloads. Since SBS has evidently downgraded the quality of their videos quite a bit by reducing the bit/data rates (as clear via watching them in a browser), this doesn't appear to be a yt-dlp bug so I've not posted an issue. I'm just noting it here as evidence of further changes at SBS and something that may or may not be related to the current issue.

dirkf commented 1 year ago

Check the penultimate line of the patch code above to see where the contraband fragment comes from.

This mechanism is used to pass options with a URL from a user-facing extractor (eg SBS) to a video host extractor (eg ThePlatform) through a url or url_transparent extractor result.

dirkf commented 1 year ago

For devs out of region, note that the API used by the extractor isn't geo-blocked, and so the various media links tested in the patch code can easily be inspected.

player_params['relatedItemsURL']

This wouldn't be a useful substitute to judge from the one I checked: just a JSON playlist of similar shows.

gamer191 commented 1 year ago

player_params['relatedItemsURL']

my patch doesn't touch that line, only the line above it. EDIT: it was added in https://github.com/yt-dlp/yt-dlp/commit/3c283a381e4f7a69bf57c3ea85aab3c85ce0e309 By the way, my patch is a workaround. I doubt that using the Android hls is an acceptable fix. What do you think?

ohnotnow commented 1 year ago

I downloaded an episode of a daily show using the android work-around and compared it to previous 'normal' ones and it's about 1/3rd smaller in filesize. Whether that's due to the android 'version' or the SBS video downgrade noted above I can't say though. I can't say I noticed it being hugely lower quality when I watched it.

dirkf commented 1 year ago

From the yt-dl thread, a new API should be used.

The htmlandroid links include a sz query parameter that implies the size would be ~ 550x300.

my patch doesn't touch that line, only the line above it. EDIT: it was added in 3c283a3

Sure, I guess the suspect alternative has never been reached.

dwids commented 1 year ago

Thanks for the "Workaround: use the android HLS" patch (above). I am relatively new to Python (Windows 10 and Ubuntu 22 via WSL). Is there a guide as to how I can implement this patch? I can use Git on both of my platforms. Thanks

dwids commented 1 year ago

Thanks for the "Workaround: use the android HLS" patch (above). I am relatively new to Python (Windows 10 and Ubuntu 22 via WSL). Is there a guide as to how I can implement this patch? I can use Git on both of my platforms. Thanks

Ok, it worked it out. Followed the official steps to regenerate the .exe from the source tree (after changing the sbs.py code) Worked! Thanks for the patch

bnw42 commented 1 year ago

And for those of us not up to python coding and compiling? :)

ringofyre commented 1 year ago

Confirming the issue - yt-dlp -v https://www.sbs.com.au/ondemand/watch/2170807363789 [debug] Command-line config: ['-v', 'https://www.sbs.com.au/ondemand/watch/2170807363789'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2023.03.04 [392389b7d] [debug] Python 3.9.2 (CPython x86_64 64bit) - Linux-6.2.5-x64v3-xanmod1-x86_64-with-glibc2.31 (OpenSSL 1.1.1n 15 Mar 2022, glibc 2.31) [debug] exe versions: ffmpeg 4.3.5-0, ffprobe 4.3.5-0, rtmpdump 2.4 [debug] Optional libraries: Cryptodome-3.9.7, brotli-1.0.9, certifi-2022.09.24, mutagen-1.45.1, pyxattr-0.7.2, sqlite3-2.6.0, websockets-10.4 [debug] Proxy map: {} [debug] Loaded 1786 extractors [SBS] Extracting URL: https://www.sbs.com.au/ondemand/watch/2170807363789 [SBS] 2170807363789: Downloading JSON metadata [ThePlatform] Extracting URL: http://link.theplatform.com/s/Bgtm9B/gQ6Y3PTQpRro?feed=Video%20-%20Single&mbr=true&manifest=m3u&ord=6908369&policy=11929623&dfptag=sz%3D530x298%26iu%3D%2F4117%2Fvideo.factual.sbs.com.au%2Fsec30htmlweb%26ciu_szs%26impl%3Ds%26gdfp_req%3D1%26env%3Dvp%26output%3Dxml_vast2%26unviewed_position_start%3D1%26url%3Dwww.sbs.com.au%26description_url%3DSBS%26cust_params%3Dtype%253Dpreroll%26ad_rule%3D0%26cmsid%3D531%26nofb%3D1%26url%3Dhttp%253A%252F%252Fwww.sbs.com.au%252Fondemand%252Fvideo%252Fsingle%252F2170807363789%26description_url%3DSBS%26correlator%3D--ORD--%26vid%3D2170807363789&dfpmidtag=sz%3D530x298%26iu%3D%2F4117%2Fvideo.factual.sbs.com.au%2Fsec30midrollhtmlweb%26ciu_szs%26impl%3Ds%26gdfp_req%3D1%26env%3Dvp%26output%3Dxml_vast2%26unviewed_position_start%3D1%26url%3Dwww.sbs.com.au%26description_url%3DSBS%26cust_params%3Dtype%253Dmidroll%26ad_rule%3D0%26cmsid%3D531%26nofb%3D1%26url%3Dhttp%253A%252F%252Fwww.sbs.com.au%252Fondemand%252Fvideo%252Fsingle%252F2170807363789%26description_url%3DSBS%26correlator%3D--ORD--%26vid%3D2170807363789#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D [ThePlatform] gQ6Y3PTQpRro: Downloading SMIL data [ThePlatform] gQ6Y3PTQpRro: Downloading m3u8 information WARNING: [ThePlatform] Failed to download m3u8 information: HTTP Error 403: Forbidden [ThePlatform] gQ6Y3PTQpRro: Downloading JSON metadata ERROR: [ThePlatform] gQ6Y3PTQpRro: No video formats found!; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U Traceback (most recent call last): File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 1518, in wrapper return func(self, *args, **kwargs) File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 1615, in __extract_info return self.process_ie_result(ie_result, download, extra_info) File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 1727, in process_ie_result return self.process_ie_result( File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 1674, in process_ie_result ie_result = self.process_video_result(ie_result, download=download) File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 2615, in process_video_result self.raise_no_formats(info_dict) File "/usr/lib/python3/dist-packages/yt_dlp/YoutubeDL.py", line 1046, in raise_no_formats raise ExtractorError(msg, video_id=info['id'], ie=info['extractor'], yt_dlp.utils.ExtractorError: [ThePlatform] gQ6Y3PTQpRro: No video formats found!; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U

gamer191 commented 1 year ago

I'm marking this as patch-available, since Dirkf has PRed https://github.com/ytdl-org/youtube-dl/pull/31880 in youtube-dl. I hope I am using the patch-available label correctly

And for those of us not up to python coding and compiling? :)

Lucky for all of you, me and Dirkf discovered that running yt-dlp "https://www.sbs.com.au/api/v3/video_smil?context=tv&id=VIDEOID" works (replace VIDEOID with the number at the end of the sbs url)

Ok, I worked it out. Followed the official steps to regenerate the .exe from the source tree (after changing the sbs.py code) Worked! Thanks for the patch

By the way, you didn't need to, you can just open command prompt inside the repo and run yt-dlp.cmd URL (the .cmd isn't strictly necessary, I don't think)

bnw42 commented 1 year ago

I tried the tailored url option and it repeatedly failed with several different shows:

yt-dlp -v https://www.sbs.com.au/api/v3/video_smil?context=tv&id=2176228931740 [debug] Command-line config: ['-v', 'https://www.sbs.com.au/api/v3/video_smil?context=tv'] [debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT) [debug] yt-dlp version stable@2023.03.04 [392389b7d] (win_exe) [debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-7-6.1.7601-SP1 (OpenSSL 1.1.1k 25 Mar 2021) [debug] exe versions: ffmpeg git-2020-06-10-9dfb19b, ffprobe git-2020-06-10-9dfb19b [debug] Optional libraries: Cryptodome-3.17, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4 [debug] Proxy map: {} [debug] Loaded 1786 extractors [generic] Extracting URL: https://www.sbs.com.au/api/v3/video_smil?context=tv [generic] video_smil?context=tv: Downloading webpage ERROR: [generic] None: Unable to download webpage: HTTP Error 400: Bad Request (caused by <HTTPError 400: 'Bad Request'>); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U File "yt_dlp\extractor\common.py", line 694, in extract File "yt_dlp\extractor\generic.py", line 2371, in _real_extract File "yt_dlp\extractor\common.py", line 839, in _request_webpage File "yt_dlp\extractor\common.py", line 821, in _request_webpage File "yt_dlp\YoutubeDL.py", line 3742, in urlopen File "urllib\request.py", line 531, in open File "urllib\request.py", line 640, in http_response File "urllib\request.py", line 569, in error File "urllib\request.py", line 502, in _call_chain File "urllib\request.py", line 649, in http_error_default urllib.error.HTTPError: HTTP Error 400: Bad Request 'id' is not recognized as an internal or external command, operable program or batch file.

EDIT: Using yt-dlp https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxx does work. Where xxxxxxxx is the ID.

dirkf commented 1 year ago

That URL actually links to a SMIL manifest playable in many media players. Try putting it in your FF or Chrome-alike URL bar.

gamer191 commented 1 year ago

I tried the tailored url option and it repeatedly failed with several different shows:

Fixed! Sorry about that

pukkandan commented 1 year ago

I'm marking this as patch-available, since Dirkf has PRed ytdl-org/youtube-dl#31880 in youtube-dl. I hope I am using the patch-available label correctly

Yes, and solved-upstream once it's merged signifying I can close the issue during the periodic upstream merge

dwids commented 1 year ago

Thanks for the video_smil?id tip. Some powershell semi-pseudo-code :-) to take an SBS url ($url) and use this tip:

$url -match "/(\d+)$" > $null    # get the ID of the item
$id = $Matches[1]
$newurl = "https://www.sbs.com.au/api/v3/video_smil?id=" + $id   # build new URL
$jdata = yt-dlp  --no-warnings -j  $newurl  | ConvertFrom-Json  #get some info on the item 
# title NOT in that json, so build our own filename and ext:
$newName =  "SBS_" + $id + "_" + $jdata.format_id + "_" + $jdata.resolution  + "." + $jdata.ext  
# and finally ...
yt-dlp --no-warnings --quiet --progress $newurl -o $newName

Quick and dirty, I'm sure it can be cleaned up.

vidiot720 commented 1 year ago

I tried the tailored url option and it repeatedly failed with several different shows:

I note your fixed comment, but for benefit of other readers not sure what the fix was, it looks like URL must be in quotes, else it's split at the ampersand and you get the Bad Request 'id' is not recognized as an internal or external command, operable program or batch file message.

bnw42 commented 1 year ago

Vidiot720, just leave out "context=tv&" and it works fine. That is, use: https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxx where xxxxxxxx is the ID.

I tried it on several shows and it worked fine.

ringofyre commented 1 year ago

Confirming the id swapperoo works - just change the filename or -o

vidiot720 commented 1 year ago

I tried it on several shows and it worked fine.

Thanks. I thought from the earlier discussion on the context=tv parameter on the youtube-dl issue that it helped with login bypass, and leaves out the ads. I assume you're still getting the highest quality and no ads in the resulting file.

I haven't tested this method directly as I'm running dirkf's patch to sbs.py to test with yt-dlp. It's working OK; there's:

dirkf commented 1 year ago

an open question about metadata handling for episodes, and

Ask away - oh, I see you did.

a call to self._sort_formats(formats) that's deprecated on yt-dlp.

But necessary in yt-dl ATM. Feel free to delete it for yt-dlp.

vidiot720 commented 1 year ago

Feel free to delete it for yt-dlp.

Thanks, understood; I was really just flagging this for downstream (here).

bnw42 commented 1 year ago

Leaving out context=tv I still get the highest available quality by default and no adverts. There was a discussion elsewhere last year relating Australia's other non-commercial streaming TV service that was introducing login requirements and from the PR material they released, their software would query the viewer's platform to determine which browser and os you were using. If it could not get an answer (which would be the case for TVs or youtube-dl & it's forks) then the requirement for login creds would be bypassed. Since leaving out context=tv works, I assume something similar may be the case here as well. SBS does require you to login to watch in a browser.

bnw42 commented 1 year ago

Just another SBS quirk - on an SBS forum some have noted that using "https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxx" the downloaded subtitles are incomplete and/or the timings are noticeably off.

gamer191 commented 1 year ago

I note your fixed comment, but for benefit of other readers not sure what the fix was, it looks like URL must be in quotes, else it's split at the ampersand and you get the Bad Request 'id' is not recognized as an internal or external command, operable program or batch file message.

Yes. URLs containing an ambersand (&) must always be quoted

just leave out "context=tv&" and it works fine

I'll look into it more later, but I don't think that's necessarily an improvement https://github.com/ytdl-org/youtube-dl/issues/31841#issuecomment-1474717030

Thanks. I thought from the earlier discussion on the context=tv parameter on the youtube-dl issue that it helped with login bypass

It's useful for bypassing login on the api, not on the actual url that the api gives us (which we've found the pattern for now)

and leaves out the ads

That's correct, although I don't know how necessary it is or whether ads are automatically skipped (see answer two above)

Feel free to delete it for yt-dlp.

Correct me if I'm wrong, but I don't think a PR porting a yt-dl commit would be helpful

Since leaving out context=tv works, I assume something similar may be the case here as well.

Interesting, I'll look into this. Thanks for the info!

Just another SBS quirk - on an SBS forum some have noted that using "https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxx" the downloaded subtitles are incomplete and/or the timings are noticeably off.

Can you please send a link to that forum, if it's public?

bnw42 commented 1 year ago

Here's the forum link. It's public to view, but you need to register to post. It's a forum devoted to downloading from SBS.

https://forums.whirlpool.net.au/thread/3q6p5669?p=-1#bottom

vidiot720 commented 1 year ago

the downloaded subtitles are incomplete and/or the timings are noticeably off.

I wouldn't be surprised, given the SMIL contains the video in segments, with ads in between. I've never tried to get the subtitles; there's usually a message like: WARNING: [SBS] Ignoring subtitle tracks found in the SMIL manifest; and you don't get them either in the downloaded video file, or separately. That's still true with the latest patch in testing.

Although the 'new' API reports higher TBRs than the old, it's still downloading 1.6 Mb/s streams at highest quality for videos posted before the old API stopped working for newer videos. The 1.6 Mb/s was a fixed total bit-rate, whereas the new encodings appear to be VBR averaging from ~960 - ~1290 Kb/s, so perceived quality may be just as good, if adapted to content complexity. EDIT: a fair comparison could be made from looking at Ep 5 of The Walk-in (av. ~920 Kb/s) vs. earlier episodes, or similarly for S 2 Ep 6 of Bloodlands (~1025 Kb/s) vs. earlier eps at 1.6Mb/s as these both straddle the changeover to VBR.

vidiot720 commented 1 year ago

Correct me if I'm wrong, but I don't think a PR porting a yt-dl commit would be helpful

I wasn't sure on the process pukkandan referred to as "merging upstream", above. If yt-dlp takes the patch as-is from yt-dl, a further patch needed to remove the deprecated call to _sort_formats.

dwids commented 1 year ago

FWIW if you run this regex on the links of type "https://www.sbs.com.au/ondemand/tv-series/the-abyss-rise-and-fall-of-the-nazis/season-1/the-abyss-rise-and-fall-of-the-nazis-s1-ep1/2166627395813" ...

^.*?//.*?/.*?/.*?/(.*?)/

...that first Group should be the Title. You can then do a Replace("-", " ") to tidy it up. Still new to Regex but this seems to work for me in PowerShell

$url -match "^.*?//.*?/.*?/.*?/(.*?)/"  > $null   #don't show True or False trick
$title = $Matches[1].Replace("-", " ")
# the abyss rise and fall of the nazis

Can then use this in building filename for use in -o in yt-dlp. Swap the -match regex with...

/.*/(.*?)/\d+$

...and it will give the abyss rise and fall of the nazis s1 ep1

dwids commented 1 year ago

Hey just noted why some subtitles may not be being downloaded..or at least the 'manual' way I do it.

I normally find them under the json 'key' .subtitles.en but with The Abyss - Rise and Fall of the Nazis ep 5 that key is .subtitles.eng So looking for .en - which is what it is for e01 thru 04 - gives nothing.

gamer191 commented 1 year ago

Just another SBS quirk - on an SBS forum some have noted that using "https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxx" the downloaded subtitles are incomplete and/or the timings are noticeably off.

No one mentioned anything about the timings as far as I can see. As for the subtitle issue, use --sub-format dfxp. I will check now whether that's required for Dirkf's pr, and if yes I will post a message about it.

WARNING: [SBS] Ignoring subtitle tracks found in the SMIL manifest; and you don't get them either in the downloaded video file, or separately. That's still true with the latest patch in testing.

Even in the https://www.sbs.com.au/api/v3/video_smil?context=tv&id=xxxxxxxx workaround?

Although the 'new' API reports higher TBRs than the old, it's still downloading 1.6 Mb/s streams at highest quality for videos posted before the old API stopped working for newer videos. The 1.6 Mb/s was a fixed total bit-rate, whereas the new encodings appear to be VBR averaging from ~960 - ~1290 Kb/s, so perceived quality may be just as good, if adapted to content complexity. EDIT: a fair comparison could be made from looking at Ep 5 of The Walk-in (av. ~920 Kb/s) vs. earlier episodes, or similarly for S 2 Ep 6 of Bloodlands (~1025 Kb/s) vs. earlier eps at 1.6Mb/s as these both straddle the changeover to VBR.

Is the quality lower then on the website?

I wasn't sure on the process pukkandan referred to as "merging upstream", above. If yt-dlp takes the patch as-is from yt-dl, a further patch needed to remove the deprecated call to _sort_formats.

Potentially. My reply wasn't directed at you though

bnw42 commented 1 year ago

Re the timings, Oblong wrote on the SBS forum: "I've noticed another issue: subtitles incomplete for "The Abyss". Ep 8 is ok. Ep 9 subtitles stop at 00:01:14. They are complete when viewing the episode with a browser. Ep 10 also stops short. For Ep 9, video_smil.ENG.srt is downloaded, 1.3 kB. When looking in the file the timings just stop, with no error message..."

dirkf commented 1 year ago

I wouldn't be surprised, given the SMIL contains the video in segments, with ads in between. I've never tried to get the subtitles; there's usually a message like: WARNING: [SBS] Ignoring subtitle tracks found in the SMIL manifest; and you don't get them either in the downloaded video file, or separately. That's still true with the latest patch in testing.

What's happening here is that yt-dl doesn't (yet) have the methods that get subtitles while extracting manifests.

In yt-dlp, try changing this line

-        formats = self._extract_smil_formats(smil_url, video_id, fatal=False) or []
+        formats, subtitles = self._extract_smil_formats_and_subtitles(smil_url, video_id, fatal=False) or ([], {})

and adding 'subtitles': subtitles, to the return value.

vidiot720 commented 1 year ago

n yt-dlp, try changing this line...

Thanks, dirkf; can confirm with these changes and --sub-format "srt" --write-subs added, the subs are downloaded and playback in sync OK for VLC for the whole program; tested with The Walk-In Ep 5; Trying bnw42's example, only receive 1.31 KiB in subs file; suspect this is an issue at SBS's end, rather than an issue with this workaround via SMIL.

vidiot720 commented 1 year ago

Even in the https://www.sbs.com.au/api/v3/video_smil?context=tv&id=xxxxxxxx workaround?

Haven't tried this workaround; having applied the upstream patch successfully, I've been testing that.

Is the quality lower then on the website?

Sorry, can't speak to that; I can't use the website, hence being a yt-dlp user.

Potentially. My reply wasn't directed at you though

Still not clear on what it all meant; not to worry, will leave this question on adapting(?) dirkf's PR to yt-dlp to the experts. Just flagging additional changes for yt-dlp downstream as dirkf advised to:

gamer191 commented 1 year ago

"I've noticed another issue: subtitles incomplete for "The Abyss". Trying bnw42's example, only receive 1.31 KiB in subs file

use --sub-format dfxp

Still not clear on what it all meant

Nothing, I was just telling people not to open a PR porting the extractor from youtube-dl

Just flagging additional changes for yt-dlp downstream as dirkf advised to:

yeah, based on that I guess the merge checklist is:

  1. remove the deprecated sort_formats call
  2. capturn and return subtitles
  3. potentially deprioritise srt subtitle formats, due to https://github.com/yt-dlp/yt-dlp/issues/6543#issuecomment-1475902476
  4. post-merge, I'm probably going to write a pr to make use of the new AUS_TV_PARENTAL_GUIDELINES function in other Aussie website extractors. @dirkf Do you want me to make that PR in the youtube-dl repo, since it's probably pretty trivial to add python 2 support for? Otherwise I'll just make it in yt-dlp, since that's where I make all my PRs
vidiot720 commented 1 year ago

use --sub-format dfxp

OK, this appears to be an issue at SBS's end, where the presence in a subtitle of ¾ interrupts the SRT generation, but does not affect the dfxp. SBS don't have a great track record handling special characters and mojibake issues are frequent, so it doesn't surprise me.

Sorry, it was not clear earlier that using dxfp was being suggested as the work-around for this issue in particular, say as opposed to a known issue with yt-dl/yt-dlp's handling of sub formats. --convert-subs doesn't seem to be in the readme for either project.

vidiot720 commented 1 year ago

I guess the merge checklist

Not to forget the metadata issues, at least to put back the previous behaviour for setting title.

gamer191 commented 1 year ago

SBS don't have a great track record handling special characters

Which subtitle format is more reliable? I assume dfxp, since it's used by the website (which proxies it through a subtitle conversion service that seems to be self-hosted by sbs)

Not to forget the metadata issues, at least to put back the previous behaviour for setting title.

Sorry, I should have been more clear. My checklist was for porting the pr to yt-dlp. Is the metadata issue going to be fixed in youtube-dl?

vidiot720 commented 1 year ago

Which subtitle format is more reliable? I assume dfxp,

Based on the example so far, unclear. Looking again at the example cited by bnw42, the XML encoding is reported as 'UTF-16', which the SubtitlesConverter errors out on with message: xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30. For what it's worth, the downloaded file appears to be UTF-8, but not sure if that's unmolested by the downloader. As downloaded, VLC won't load the dfxp with "Add Subtitle...", whether or not the coding is corrected in the downloaded file, either by changing the encoding report in the file to UTF-8, or converting the file to UTF-16. EDIT: OK, you have to change the file extension for VLC to recognise it to .ttml, as well as doing the encoding correction or conversion. It then kind of works, although the line formatting in VLC is poor.

If yt-dl/dlp can handle the conversion, it might be nice to set dfxp as the default to work-around SBS's poor conversion service, if that is the issue. Not sure if the conversion to SRT can be handled if requested via --sub-format "srt" implicitly (i.e. behave as if --sub-format "dfxp" --convert-subs "srt" was used), particularly if there's an issue with the dfxp anyway, which is maybe what is tripping up SBS's own conversion service.

My checklist was for porting the pr to yt-dlp. Is the metadata issue going to be fixed in youtube-dl?

Sorry, yes. I hope this will be addressed in yt-dl.

vidiot720 commented 1 year ago

there's an issue with the dfxp anyway

OK, verified that --sub-format "dfxp" --convert-subs "srt" will work without error, but only with a kludge introduced into line 4096 of utils.py, dfxp_data = dfxp_data.replace(b'encoding=\'UTF-16\'', b'encoding=\'UTF-8\'')

Solving the original encoding problem for downloaded dfxp subs for SBS is probably a topic for another issue, so won't add more about it here.

NicGeoLaw commented 1 year ago

I am encountering the same issues with SBS On Demand today https://www.sbs.com.au/ondemand/watch/1837973059920

bnw42 commented 1 year ago

NicGeoLaw, Please read the above posts. The issue has been identified and a fix will hopefully be appearing in a future update. Until then use the recommended workaround, substituting the video ID into the following (replacing xxxxxxxx). The video ID is the number at the end of the webpage url for the show you want.

yt-dlp https://www.sbs.com.au/api/v3/video_smil?id=xxxxxxxxx

As of monday evening this was still working.

dwids commented 1 year ago

Just to add to what @bnw42 said above, this may help. I'm not a developer but do dabble in PowerShell etc:

# url = original one, eg  $url = "https://www.sbs.com.au/ondemand/watch/2172633667829"
# grab the ID
$url -match "/(\d+)$" > $null
$id = $Matches[1]
# build new url and use that
$newurl = "https://www.sbs.com.au/api/v3/video_smil?id=" + $id
yt-dlp  $newurl

Works in .ps1 script too

vidiot720 commented 1 year ago

PR #6839 now in review, vastly improved due to @bashonly, particularly for geo bypass handling, and with some extra improvements to metadata handling for episode and is_live, as well as plenty of coding style tweaks. Has been tested here, but not as much as the earlier PR from @dirkf got. Big thanks to the reviewers for putting up with the python and yt-dlp noobity on my end!

dwids commented 1 year ago

bashonly pushed a commit that referenced this issue

This is great, thanks to all involved. I'm not familiar with happens next; does this go into the next nightly build for us to use....or something else?