ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.18k stars 9.93k forks source link

aljazeera.com: Unable to download videos not embedded in articles #29517

Open sebix opened 3 years ago

sebix commented 3 years ago

Checklist

Verbose log

$ youtube-dl -v https://www.aljazeera.com/program/generation-change/2021/7/7/us-police-brutality-and-black-lives-matter
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.aljazeera.com/program/generation-change/2021/7/7/us-police-brutality-and-black-lives-matter']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.8.10 (CPython) - Linux-5.12.13-1-default-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[AlJazeera] us-police-brutality-and-black-lives-matter: Downloading JSON metadata
ERROR: Unable to download JSON metadata: HTTP Error 400: Bad Request (caused by <HTTPError 400: 'Bad Request'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/bin/youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib64/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib64/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

Taking any video from https://www.aljazeera.com/videos/ results in the error shown above. Using the stream detector FF add-on shows some m3u8 files, of which some again point to m3u8 files, but I have so far only seen video streams (ts), of which some also return errors. I was able to extract all the video streams, but without audio so far.

Videos which are embedded in articles work, e.g. https://www.aljazeera.com/economy/2021/7/10/g20-signs-off-on-landmark-global-tax-reform Here, youtube-dl downloads the video just fine, e.g.:

$ youtube-dl https://www.aljazeera.com/economy/2021/7/10/g20-signs-off-on-landmark-global-tax-reform
[generic] g20-signs-off-on-landmark-global-tax-reform: Requesting header
WARNING: Falling back on generic information extractor.
[generic] g20-signs-off-on-landmark-global-tax-reform: Downloading webpage
[generic] g20-signs-off-on-landmark-global-tax-reform: Extracting information
[download] Downloading playlist: G20 backs landmark global tax reform
[generic] playlist G20 backs landmark global tax reform: Collected 3 video ids (downloading 3 of them)
[download] Downloading video 1 of 3
[brightcove:new] 6257606655001: Downloading JSON metadata
[brightcove:new] 6257606655001: Downloading JSON metadata
[brightcove:new] 6257606655001: Downloading m3u8 information
[brightcove:new] 6257606655001: Downloading m3u8 information
[brightcove:new] 6257606655001: Downloading m3u8 information
[brightcove:new] 6257606655001: Downloading m3u8 information
[brightcove:new] 6257606655001: Downloading MPD manifest
[brightcove:new] 6257606655001: Downloading MPD manifest
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 16
[download] Destination: G7 nations reach historic deal to tax multinational corporations-6257606655001.fhls-4521-1.mp4
[download] 100% of 75.97MiB in 02:43
[dashsegments] Total fragments: 27
[download] Destination: G7 nations reach historic deal to tax multinational corporations-6257606655001.fdash-61c93800-abf5-4f58-8a09-4db01cac7056-1.m4a
[download] 100% of 2.41MiB in 01:29
[ffmpeg] Merging formats into "G7 nations reach historic deal to tax multinational corporations-6257606655001.mp4"
Deleting original file G7 nations reach historic deal to tax multinational corporations-6257606655001.fhls-4521-1.mp4 (pass -k to keep)
Deleting original file G7 nations reach historic deal to tax multinational corporations-6257606655001.fdash-61c93800-abf5-4f58-8a09-4db01cac7056-1.m4a (pass -k to keep)
[download] Downloading video 2 of 3
...

I'm sorry that I can't contribute more here. Please let me know if there's anything that I could help with.

8chanAnon commented 3 years ago

The relevant HTML snippet looks like this:

"embedUrl": "https://players.brightcove.net/665003303001/6tKQRAx7lu_default/index.html?videoId=6262384329001"

Al Jazeera pages normally only contain the Brightcove account number and video id, not the full video link.

sebix commented 3 years ago

https://github.com/yt-dlp/yt-dlp/pull/763 could be a related fix