ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.39k stars 9.96k forks source link

VrtNU extractor broken #27707

Open covert8 opened 3 years ago

covert8 commented 3 years ago

Checklist

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-f', 'best', 'https://www.vrt.be/vrtnu/a-z/terzake/', '--verbose', '--print-traffic']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.01.03
[debug] Python version 3.9.1 (CPython) - Linux-5.9.14-arch1-1-x86_64-with-glibc2.32
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[VrtNU] terzake: Downloading webpage
send: b'GET /vrtnu/a-z/terzake/ HTTP/1.1\r\nHost: www.vrt.be\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3775.5 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/html;charset=utf-8
header: Content-Length: 7721
header: Connection: close
header: Date: Thu, 07 Jan 2021 10:23:07 GMT
header: X-Content-Type-Options: nosniff
header: Expires: Thu, 07 Jan 2021 10:24:01 GMT
header: Cache-Control: max-age=300
header: X-UA-Compatible: IE=edge
header: Content-Encoding: gzip
header: X-Served-By: i-0632883e90d7e8d22
header: Accept-Ranges: bytes
header: Vary: Accept-Encoding
header: X-Cache: Miss from cloudfront
header: Via: 1.1 7d12bef71f48487e9202b581d949876e.cloudfront.net (CloudFront)
header: X-Amz-Cf-Pop: BRU50-C1
header: X-Amz-Cf-Id: TVWg6_jEmI_EI5KtRYQZEIYXxXz5UK52rlU9mc3BoKH_JVJTlOrIAw==
header: Age: 245
[VrtNU] terzake: Downloading JSON metadata
send: b'GET /vrtnu/a-z/terzake.mssecurevideo.json HTTP/1.1\r\nHost: www.vrt.be\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3775.5 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 410 Gone\r\n'
header: Content-Type: text/html; charset=utf-8
header: Content-Length: 238
header: Connection: close
header: Date: Thu, 07 Jan 2021 10:23:07 GMT
header: Server: Varnish
header: X-Varnish: 179536045
header: X-Robots-Tag: noindex, nofollow
header: Cache-Control: max-age=604800
header: Retry-After: 5
header: X-Cache: Error from cloudfront
header: Via: 1.1 32e3b86ae254a231182567c0124af893.cloudfront.net (CloudFront)
header: X-Amz-Cf-Pop: FRA2-C2
header: X-Amz-Cf-Id: 8fI1TmRF_DZywQ5k0BDRZfiAuK-ioTVYDRlO7EkzFRReDmL_e5A0vA==
ERROR: Unable to download JSON metadata: HTTP Error 410: Gone (caused by <HTTPError 410: 'Gone'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 632, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 2248, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

The website vrt.nu has updated its delivery to akami and the relevant ID's seem to be located in a different json (eg: https://remix-cf-vrt.akamaized.net/remix/$ID/remix.ism/.m3u8). The mentioned ID is part of the json response from https://media-services-public.vrt.be. The origin of the request seems to originate from sentry (https://github.com/getsentry/sentry-javascript). I don't have the time to investigate further, hopefully i'll be looking in to this myself at a later date. Pointers on how to find the source of the magic json would be much gladly received.

covert8 commented 3 years ago

11873