Open gswan opened 1 year ago
yt-dlp understands the ../tv-series/.. URL, but I can't exercise either that or yt-dl from outside Oz: "This content is not available in your location." The only significant code difference is the URL pattern, though, so I expect the same problem. The unsupported URL issue is #30325, fixed with a patch there and in the released yt-dlp.
https://github.com/yt-dlp/yt-dlp/issues/5544 shows 403 with live HLS rather than DASH.
We had to modify access for another service that uses ThePlatform but this doesn't seem to be similar.
OK thanks. I checked the patch, updated the source and rebuilt yt-dl. I think they have only just made updates to their streaming platform video obfuscation as the previous successful episodes no longer download either (as a test). The browser (FF) is able to stream everything OK. Running the same command results in this now:
$ youtube-dl -v https://www.sbs.com.au/ondemand/watch/2175290435999
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.sbs.com.au/ondemand/watch/2175290435999']
WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
[debug] Encodings: locale ANSI_X3.4-1968, fs ANSI_X3.4-1968, out ANSI_X3.4-1968, pref ANSI_X3.4-1968
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.17 (CPython) - Linux-5.4.2-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[SBS] 2175290435999: Downloading JSON metadata
[ThePlatform] 8Eexyds5RzGA: Downloading SMIL data
[ThePlatform] 8Eexyds5RzGA: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
ie_result = self._real_extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/theplatform.py", line 309, in _real_extract
self._sort_formats(formats)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 1374, in _sort_formats
raise ExtractorError('No video formats found')
ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Can you play the show in FF with DRM (EME) disabled?
If not, SBS is now encrypting its media and yt-dl is no help with such pages. If DRM affects all site media, the site could be marked as not working; if only some, the extractor should report the problem instead of "No video formats found". Or perhaps there is some new access protocol for the failing MPD manifest that would reveal playable formats.
BTW you can format your logs by putting triple backquotes (```) above and below, and append console
to the top triple for extra formatting credit.
Thanks for the suggestion. I switched off DRM in FF (unchecked "Play DRM-controlled content") and it played OK still. The URL in FF appeared as: https://www.sbs.com.au/ondemand/watch/2175290435999 I can use tcpdump to capture the packet interchange if you like, but I'm not sure if that will show anything interesting.
The best thing would be to use the FF devtools where there should be an option to dump the HTTP[s] exchanges in HAR file in the network tab. Clear the network history, then navigate to the page until the video starts playing, and create the HAR: (eg) https://documentation.n-able.com/takecontrol/troubleshooting/Content/kb/How-to-save-browser-developer-tools-HAR-files-to-provide-to-support.htm.
Nice. I hadn't used that option before. Where would you want me to send the HAR file? It's quite large (around 6MB) by the time I can pause the capture.
Maybe a gist?
So, it's a new set of APIs.
video_id = '2175290435999'
details
from https://catalogue.pr.sbsod.com/mpx-media/{video_id}
, including: series_slug = details['seriesSlug']
https://catalogue.pr.sbsod.com/tv-series/{series_slug}
There must be another response with the media link.
Then there appears to be some advert catalogue inserts, followed by the content chunks. I've truncated some of the longer chunks.
This is what we're after:
https://sbs-vod-dai-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8?hdnea=st=1678866635~exp=1678870235~acl=/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/*~id=22da613f-1d6d-46b8-be00-d2ff24f6f519~hmac=c34f8467d784258e0d57bd203640e194745b00fd3243d68a99f7a279188b4b2f&originpath=/ondemand/hls/content/2488267/vid/2175290435999B/SIN/streams/5472a13e-a19a-4a35-8225-80bf74da1915/master.m3u8
https://dai.google.com/ondemand/hls/content/2488267/vid/2175290435999B/SIN/streams/5472a13e-a19a-4a35-8225-80bf74da1915/media/281dd385dae3d96cede76042fd7753ab.m3u8?aka_me_session_id=AAAAAAAAAACkzBJkAAAAAC8OK0KkFYBwSQEUOoKJ2UZGSWxpzXeN3XbymWYYmNWf2VR0FMUSw3dAoh2SGw1g6eeGQBsRcIVY&aka_media_format_type=hls
Are either of these found in a previous response?
Here's the content that contain this. I've truncated (as shown by [truncated]) some larger chunks of text data due to size.
So, it's a new set of APIs.
That's interesting, because my amateur reverse engineering last night lead me to a completely different API, although I haven't tested it yet. I'll post details later, because they're on my computer and I'm not home right now
This set of data was captured on 2023-03-15T19:00:36.675+11:00. The day before this they appeared to move the changes into production. I first noticed sporadic oddness on the 13th.
... I can fake my location to Down Under 😜 , but someone must be willing to share a (temp) SBS account, because:
GET HXR
https://www.sbs.com.au/api/v3/video_stream?context=odwebsite&id=2175290435999 =>
{
"error": "You must be signed in"
}
The URL wouldn't work anyway.
$ youtube-dl -v https://www.sbs.com.au/api/v3/video_stream?context=odwebsite&id=2175290435999
[1] 25257
[gswan@svr7002 sbs]$ [debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.sbs.com.au/api/v3/video_stream?context=odwebsite']
WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
[debug] Encodings: locale ANSI_X3.4-1968, fs ANSI_X3.4-1968, out ANSI_X3.4-1968, pref ANSI_X3.4-1968
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.17 (CPython) - Linux-5.4.2-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[generic] video_stream?context=odwebsite: Requesting header
WARNING: Could not send HEAD request to https://www.sbs.com.au/api/v3/video_stream?context=odwebsite: HTTP Error 404: Not Found
... That URI isn't supposed to be fed directly to youtube-dl
; I got it from my browser via URL sniffing; possibly (when being logged-in) it could yield the needed media stream info (aka being the "playlist API") ...
BTW, any URI fed to youtube-dl
should be properly quoted 😜 ...
I suspect you need to hijack the session ID as well. Even though I am signed in at the IP address here on this client in another tab, using this URL with a REST API testing system (using GET verb) results in the error you see: "You must be signed in".
https://www.sbs.com.au/api/v3/video_stream?context=odwebsite&id=2175290435999
yeah, that's the same url I got last night, but I also got stuck on the authorisation bit. I have an account, but I don't know much about session ids and headers. I was planning on worling on that tonight, but I'm happy to send you my login details instead and you can work on it
PS: the old api didn't require authentication, so it sucks if this one does. This workaround (shameless self-promotion) doesn't require one, so it should probably be added as a permanent fallback
By the way, it took me ages to figure out, but odwebsite means on demand website (the website is called sbs on demand)
For future reference, context=android and context=ios also told me to sign in (everything else I tried told me “context not supported”)
Haha, context=tv lets me completely bypass the login screen :trollface:
EDIT: I'll verify later if it gives me the same result as logging in
- ... (somehow get the media link, not yet shown) ...
Now we know (how easy is this?):
https://www.sbs.com.au/api/v3/video_smil?id={video_id}
I just noticed that at the exact same time 🤣 I'll just quickly check to see if that is the same url that the authenticated context=odwebsite api provides, since I found it on the unauthenticated context=tv api, and I'm guessing you did too (correct me if I'm wrong)
I just omitted all the not obviously required query parameters and found that I got a bigger SMIL manifest than when a context
was provided. Let's see if it's a good link...
The data from the /v3/video_stream
endpoint looks like well-formed ld+json.
I'll just quickly check to see if that is the same url that the authenticated context=odwebsite api provides
Never mind, I can't figure out how to authenticate. Do you want me to share my account details?
EDIT: I figured it out
I tried this stub _real_extract()
:
def _call_api(self, video_id, path, query=None, data=None, headers=None):
return self._download_json(update_url_query(
'https://catalogue.pr.sbsod.com/' + path, query), headers=headers)
def _real_extract(self, url):
video_id = self._match_id(url)
smil_url = update_url_query(
'https://www.sbs.com.au/api/v3/video_smil', {'id': video_id})
formats = self._extract_smil_formats(smil_url, video_id)
self._sort_formats(formats)
media = self._call_api(video_id, 'mpx-media/' + video_id)
series_slug = media.get('seriesSlug') or ''
series = self._call_api(video_id, 'tv-series/' + series_slug)
# ...
return {
'id': video_id,
'title': media['title'], # correct key ???
'formats': formats,
}
From the UK, it pulls the SMIL manifest, but even with a further hack to set the Referer
used for subsidiary M3U8 manifests to the original .../ondemand/watch/...
URL I get 403 for those, and so crash or no formats, depending on fatal=False
in the _extract_smil_formats()
call. Unsurprisingly, the response headers include x-error-reason: geo-blocked
.
here's context=android. You'll notice the url is suspiciously similar, if not identical, to the url that yt-dlp (and probably also youtube-dl) currently extract from sbs
And here's context=ios (probably almost identical):
Now we know (how easy is this?):
https://www.sbs.com.au/api/v3/video_smil?id={video_id}
From the UK, it pulls the SMIL manifest, but even with a further hack to set the
Referer
used for subsidiary M3U8 manifests to the original.../ondemand/watch/...
URL I get403
for those, and so crash or no formats, depending onfatal=False
in the_extract_smil_formats()
call. Unsurprisingly, the response headers includex-error-reason: geo-blocked
.
GET
-ing "https://www.sbs.com.au/api/v3/video_smil?id=2175290435999" (no proxy, no authentication required) and inspecting the downloaded SMIL, the URI to the master HLSe manifest is:
https://sbs-vod-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8?hdnea=st%3D1679098254%7Eexp%3D1679116254%7Eacl%3D%2FContent%2FHLS_AES_TSO%2FVOD%2Fgeo%2F12986%2F2483%2Fa3330a06-4a56-45fd-911e-110540bf8c7a%2F9439a831-9b13-6280-ed4f-cb66148195cb%2F%2A%7Ehmac%3D2a18adac3a1f40ff68d43a582a0805697cd3c5d119a43e294811646e3f519fc6
Then, with a whitelisted AU HTTPS proxy:
yt-dl --proxy "localhost:8080" -vF "https://sbs-vod-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8?hdnea=st%3D1679098254%7Eexp%3D1679116254%7Eacl%3D%2FContent%2FHLS_AES_TSO%2FVOD%2Fgeo%2F12986%2F2483%2Fa3330a06-4a56-45fd-911e-110540bf8c7a%2F9439a831-9b13-6280-ed4f-cb66148195cb%2F%2A%7Ehmac%3D2a18adac3a1f40ff68d43a582a0805697cd3c5d119a43e294811646e3f519fc6" =>
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '--proxy', 'localhost:8080', '-vF', 'https://sbs-vod-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8?hdnea=st%3D1679098254%7Eexp%3D1679116254%7Eacl%3D%2FContent%2FHLS_AES_TSO%2FVOD%2Fgeo%2F12986%2F2483%2Fa3330a06-4a56-45fd-911e-110540bf8c7a%2F9439a831-9b13-6280-ed4f-cb66148195cb%2F%2A%7Ehmac%3D2a18adac3a1f40ff68d43a582a0805697cd3c5d119a43e294811646e3f519fc6']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.03.16.1919
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {'http': 'localhost:8080', 'https': 'localhost:8080'}
[generic] master: Requesting header
[generic] master: Downloading m3u8 information
[info] Available formats for master:
format code extension resolution note
439 mp4 398x224 439k , avc1.4D401E, 25.0fps, mp4a.40.2
870 mp4 640x360 870k , avc1.4D401E, 25.0fps, mp4a.40.2
1419 mp4 1024x576 1419k , avc1.4D4029, 25.0fps, mp4a.40.2
1981 mp4 1280x720 1981k , avc1.4D4029, 25.0fps, mp4a.40.2 (best)
and create the HAR: https://brainshark.zendesk.com/hc/en-us/articles/360012432712-Performing-a-Browser-Trace-Firefox.
FWIW, this article requires I "Sign in to Brainshark Support" :wink: ...
I just omitted all the not obviously required query parameters and found that I got a bigger SMIL manifest than when a
context
was provided. Let's see if it's a good link...
I just visually compared the manifest files, and the main difference is that if context isn't provided then it contains a bunch of mentions of https://securepubads.g.doubleclick.net
context=odwebsite gives a completely different json (once I've authenticated it)
Whoops, I posted the wrong json:
How interesting:
I don't know how to read a smil file, but what I did notice was it has a bunch of links to https://sbs-vod-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8
, the exact url that context=oddesktop returned as it's contenturl. I assume that means that context=tv
ultimately links to the same file as context=oddesktop
, but without requiring authentication
(it does have different parameters though)
https://en.wikipedia.org/wiki/Synchronized_Multimedia_Integration_Language
As shown in my stub extractor, yt-dl knows how to extract the media items from a SMIL manifest. But a lot of media players can play SMIL directly.
The stub extractor should work, as far as it goes, in region.
Comparing the odwebsite
and tv
JSON blocks, the former identifies as a StreamProvider
ld+json object (non-standard, in accordance with the 404 non-standard http://www.sbs.com.au/schemas/) while the latter is a (superset of the standard) VideoObject
.
The StreamProvider
has the SMIL expanded and a Chromecast block. For our purposes the tv
JSON is equivalent and easier to get.
In the context of steps 1-4 above, this adds actual thumbnail URLs which aren't available from the catalogue.pr.sbsod.com endpoints (only IDs that, so far at least, we don't know how to resolve: https://www.sbs.com.au/api/v3/video_image/getimage disappointingly needs an actual URL passed in its query parameters).
With all this, my draft extractor is passing its test, albeit with a different MD5 for the download fragment, but just returning the detected geo-restriction error for the problem video here. PR beckons.
And arrives: #31880. Local testing needed.
Updated here, built and tested with the original subject of this issue. Downloaded successfully. Will try a few more random items as well. Thanks for the update.
Tried a few more programs and all successfully downloaded OK.
PR #31880 has been updated to match yt-dlp/yt-dlp#6839. Any yt-dl hold-outs who would like to test it, pls report in the PR.
When will PR #31880 be available in a windows .exe form? I've been using the yt-dlp_2023.05.11.094900 fix and it has been working fine.
Yep. SBS have made another change. Working one day and then stopped with a 403 error fetching the m3u8 data. Example:
$ youtube-dl --verbose https://www.sbs.com.au/ondemand/watch/2208035907943
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.sbs.com.au/ondemand/watch/2208035907943']
[debug] Encodings: locale utf-8, fs utf-8, out utf-8, pref utf-8
[debug] youtube-dl version 2021.12.17 (single file build)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.3.1AXTOS-11-x86_64-with-glibc2.37 - OpenSSL 3.0.8 7 Feb 2023 - glibc 2.37
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[SBS] 2208035907943: Downloading JSON metadata
[ThePlatform] lRZJykj8Jcuc: Downloading SMIL data
[ThePlatform] lRZJykj8Jcuc: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 825, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 846, in __extract_info
ie_result = ie.extract(url)
^^^^^^^^^^^^^^^
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 535, in extract
ie_result = self._real_extract(url)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/theplatform.py", line 309, in _real_extract
self._sort_formats(formats)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 1375, in _sort_formats
raise ExtractorError('No video formats found')
youtube_dl.utils.ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
gswan your trouble appears to be either a transient issue or some problem at your end since it is working fine for me, with both the show you tried to download (https://www.sbs.com.au/ondemand/watch/2208035907943) and also another randomly chosen show.
I would recommend that if you have an issue with SBS first check here: (https://forums.whirlpool.net.au/thread/2699206?p=-1&#bottom) to confirm whether the issue is with yt-dlp or something else (eg your end or the SBS servers) before posting on yt-dlp. Just a suggestion. The discussion on that forum is these days pretty much limited to using yt-dlp (and frontends) since other methods no longer work. Response is often within an hour.
BTW, I use the nightly build of yt-dlp, not youtube-dl, which hasn't worked in ages.
Thanks! I did not realise there was a fork. I've been using youtube-dl for ages. Grabbed yt-dlp nightly and runs fine on my example.
not
youtube-dl
, which hasn't worked in ages.
While, sadly, this holds true for the latest binary release (from 2021), as well as when building from current master
branch, youtube-dl
works if you build it with #31880 (linked above) merged-in (you need this version of sbsIE):
yt-dl -vF "https://www.sbs.com.au/ondemand/watch/2208035907943" =>
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-vF', 'https://www.sbs.com.au/ondemand/watch/2208035907943']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.06.22.2214 (single file build)
[debug] Python 3.4.4 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 - OpenSSL1.0.2d 9 Jul 2015
[debug] exe versions: ffmpeg n6.1-dev-1252-N-111136-gf66e186, ffprobe n6.1-dev-1252-N-111136-gf66e186, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[debug] Using fake IP 1.158.238.70 (AU) as X-Forwarded-For.
[SBS] 2208035907943: Downloading SMIL file
[SBS] 2208035907943: Downloading m3u8 information
[SBS] 2208035907943: Downloading JSON metadata
[SBS] 2208035907943: Downloading JSON metadata
[info] Available formats for 2208035907943:
format code extension resolution note
hls-439 mp4 398x224 439k , avc1.4D401E, 25.0fps, mp4a.40.2
hls-870 mp4 640x360 870k , avc1.4D401E, 25.0fps, mp4a.40.2
hls-1419 mp4 1024x576 1419k , avc1.4D4029, 25.0fps, mp4a.40.2
hls-1981 mp4 1280x720 1981k , avc1.4D4029, 25.0fps, mp4a.40.2 (best)
The new SBS API can be successfully fooled by the X-F-F
trick the sbsIE employs 😉 , and the same is true for the actual stream CDN 👍 ; however, download speeds from Southern Europe are abysmal:
[SBS] 2208035907943: Downloading SMIL file
[SBS] 2208035907943: Downloading m3u8 information
[SBS] 2208035907943: Downloading JSON metadata
[SBS] 2208035907943: Downloading JSON metadata
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1466
[download] Destination: The Crash S1 Ep1 - 18_35_41-2208035907943.mp4
[download] 10.2% of ~429.98MiB at 34.11KiB/s ETA 03:13:07
aria2c
or yt-dlp
with the -N 6
flag is recommended if outside Australia...
Checklist
Verbose log
Description
This particular episode (https://www.sbs.com.au/ondemand/watch/2175290435999) fails to download. Other episodes (2175290435997, 2175290435996 etc) download correctly. The episode is viewable in the browser directly (https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999)
As a side note, attempting this browser viewing URL with youtube-dl results in an Unsupported URL' message, so the correct URL to use which results in other successful downloads is the URL originally used.