ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.41k stars 10.04k forks source link

Unable to authenticate with adobe pass / Comcast_SSO #30970

Open dki-os opened 2 years ago

dki-os commented 2 years ago

Checklist

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['http://stream.nbcsports.com/rsn/nbcs-boston?pid=2002885', '-v', '--ap-mso', 'Comcast_SSO', '-n']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.4 (CPython) - Linux-5.17.4-arch1-1-x86_64-with-glibc2.35
[debug] exe versions: ffmpeg 5.0, ffprobe 5.0, rtmpdump 2.4
[debug] Proxy map: {}
[NBCSportsStream] 2002885: Downloading JSON metadata
[NBCSportsStream] 2002885: Downloading Provider Redirect Page
[NBCSportsStream] 2002885: Logging in
[NBCSportsStream] 2002885: Retrieving Session
ERROR: Unable to download webpage: HTTP Error 401: Unauthorized (caused by <HTTPError 401: 'Unauthorized'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

I recently tried to extract a cable protected stream, in this case a regional nbc sports network, though I was able to repeat this problem on other sites as well. It is unable to authorize the session. The login works perfectly from inside my browser, so I decided to poke around that. It seems that the browser is inserting tokens from obfuscated javascript, which don't seem to be present in what is sent by youtube-dl, that being the following items on form submission:

X-hzfdeCEGvt-f
X-hzfdeCEGvt-b
X-hzfdeCEGvt-c
X-hzfdeCEGvt-d
X-hzfdeCEGvt-z
X-hzfdeCEGvt-a
dirkf commented 2 years ago

This problem.

As it's not possible to debug this without the account details, and probably the Comcast connection, it would be helpful if you could operate the controls.

I suppose these headers don't always have the same values or even predictable values? Are they sent on the 'Retrieving Session' step? (What happened to -e and -g...-y?)

Presumably the 401 is on https://sp.auth.adobe.com/adobe-services/session after the form in the page whose content is in provider_login_page_res (ie, the page eventually redirected from https://sp.auth.adobe.com/adobe-services/session/authenticate/saml) has been posted.

We may need to analyze the JS.

dki-os commented 2 years ago

I suppose these headers don't always have the same values or even predictable values? Are they sent on the 'Retrieving Session' step? (What happened to -e and -g...-y?)

The headers actually do always have those names and in that order, it's predicable in that sense. They only seem to be sent on the form being posted to xfinity's website.

Presumably the 401 is on https://sp.auth.adobe.com/adobe-services/session after the form in the page whose content is in provider_login_page_res (ie, the page eventually redirected from https://sp.auth.adobe.com/adobe-services/session/authenticate/saml) has been posted.

Yes, it is occurring on that step. I added a line to my install of youtube-dl to log the site where the error occurred, and it returned https://sp.auth.adobe.com/adobe-services/session as the site.

We may need to analyze the JS.

This is probably the solution but even "deobfuscated" the js is still very convoluted, with variables referring to other variables referring to another variable referring to an array somewhere. I wonder how to analyse how the file is being executed in the browser debugger.