ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.21k stars 9.93k forks source link

Can't download video from https://www.chabad.org/library/article_cdo/aid/113425/jewish/What-Is-Kosher.htm #31042

Open lesshaste opened 2 years ago

lesshaste commented 2 years ago

Checklist

Example URLs

Description

youtube-dl --verbose https://www.chabad.org/library/article_cdo/aid/113425/jewish/What-Is-Kosher.htm
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.chabad.org/library/article_cdo/aid/113425/jewish/What-Is-Kosher.htm']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.8.10 (CPython) - Linux-5.13.0-51-generic-x86_64-with-glibc2.29
[debug] exe versions: ffmpeg 4.2.7, ffprobe 4.2.7
[debug] Proxy map: {}
[generic] What-Is-Kosher: Requesting header
WARNING: Could not send HEAD request to https://www.chabad.org/library/article_cdo/aid/113425/jewish/What-Is-Kosher.htm: HTTP Error 503: Service Temporarily Unavailable
[generic] What-Is-Kosher: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 503: Service Temporarily Unavailable (caused by <HTTPError 503: 'Service Temporarily Unavailable'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/user/python/myenv/lib/python3.8/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/user/python/myenv/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
gamer191 commented 2 years ago

try https://www.chabad.org/5190041 (please leave this issue open, even if it works)

lesshaste commented 2 years ago

That shows much the same problem

(myenv) user@user-2020:~/Media$ youtube-dl --verbose https://www.chabad.org/5190041
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.chabad.org/5190041']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.8.10 (CPython) - Linux-5.13.0-51-generic-x86_64-with-glibc2.29
[debug] exe versions: ffmpeg 4.2.7, ffprobe 4.2.7
[debug] Proxy map: {}
[generic] 5190041: Requesting header
WARNING: Could not send HEAD request to https://www.chabad.org/5190041: HTTP Error 503: Service Temporarily Unavailable
[generic] 5190041: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 503: Service Temporarily Unavailable (caused by <HTTPError 503: 'Service Temporarily Unavailable'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/user/python/myenv/lib/python3.8/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/user/python/myenv/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
dirkf commented 2 years ago

The single video on the example page, which might appear above the heading "Origin and History of Kosher", appears to be the same one linked under "You may also be interested in...": https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm. The page and the linked video page contain this player code:

<script type="text/javascript">
$j(function() {
var player = new Co.MediaPlayer(Co.MediaPlayer.Instances.Length, "");
player.IsLocalEmbed = true;
player.CdnDomain = 'https://w2.chabad.org';
player.Domain = Co.Request.ServerName;
player.ArticleId = '5190041';
player.AvailableMediaTypes = {};
player.Width = Co.BrowserInfo.IsMobileDevice() ? "auto" : 'auto';
player.Height = Co.BrowserInfo.IsMobileDevice() ? "auto" : 'auto';
player.HideBanner = false;
player.AutoStart = false;
player.StartTime = 0 || player.StartTime;
player.AllowFullScreen = true;
player.DisableAutoplayFeature = false;
player.AvailableMediaTypes['html5'] = new Co.MediaPlayer.MediaInfo('html5', '11633639', Co.BrowserInfo.IsMobileDevice() ? "auto" : 'auto', Co.BrowserInfo.IsMobileDevice() ? "auto" : 'auto', [0, 0, 0]);
player.MediaInfo = Co.MediaInfo["item5190041"];
player.Load("PlayerArea-5190041");
});
</script>

This isn't understood by any of yt-dl's extractors as far as I know.

The linked video page also has an actual SWF video link that loads a "JewishTV" video player and might with the aid of a time machine have played the same video that the JS player above would get.

Additionally, the site has a CloudFlare block that causes the 503 error. wget --user-agent='Mozilla/5.0' ... breaks the block but the equivalent option for yt-dl fails, with Py 2.7 and 3.9.

Dietary superstition fans will have to look elsewhere, though a PR is welcome as usual.

gamer191 commented 2 years ago

fwiw yt-dlp seems to bypass the 503 error

yt-dlp https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm --verbose
[debug] Command-line config: ['https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm', '--verbose']
[debug] User config "C:\Users\jaybu\AppData\Roaming\yt-dlp\config.txt": ['--ffmpeg-location', 'C:\\Users\\jaybu\\ffmpeg\\bin', '-P', 'C:\\Users\\jaybu\\youtube.dl', '--update', '--audio-quality', '0', '--write-subs', '--write-auto-subs', '--embed-subs', '--compat-options', 'no-keep-subs,no-live-chat']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.06.22.1 [a86e01e] (win32_exe)
[debug] Compatibility options: no-live-chat, no-keep-subs
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19043-SP0
[debug] Checking exe version: "C:\Users\jaybu\ffmpeg\bin\ffmpeg" -bsfs
[debug] Checking exe version: "C:\Users\jaybu\ffmpeg\bin\ffprobe" -bsfs
[debug] exe versions: ffmpeg N-106498-g854615adf2-20220405 (setts), ffprobe N-106498-g854615adf2-20220405
[debug] Optional libraries: Cryptodome-3.14.1, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.06.22.1, Current version: 2022.06.22.1
yt-dlp is up to date (2022.06.22.1)
[debug] [generic] Extracting URL: https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm
[generic] What-Is-Kosher: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] What-Is-Kosher: Downloading webpage
[generic] What-Is-Kosher: Extracting information
[debug] Looking for video embeds
ERROR: Unsupported URL: https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm
Traceback (most recent call last):
  File "yt_dlp\YoutubeDL.py", line 1427, in wrapper
  File "yt_dlp\YoutubeDL.py", line 1497, in __extract_info
  File "yt_dlp\extractor\common.py", line 647, in extract
  File "yt_dlp\extractor\generic.py", line 4136, in _real_extract
yt_dlp.utils.UnsupportedError: Unsupported URL: https://www.chabad.org/multimedia/video_cdo/aid/5190041/jewish/What-Is-Kosher.htm
lesshaste commented 2 years ago

As a (very poor) workaround, this works:

for c in `seq 0 38`; do wget https://hls-vod-cdn.chabad.org/vod/_definst_/smil:smil_cache1/116/11633637.smil/media_b600000_$c.ts?v=21112226; done
for f in media_b600000_*; do echo "file '$f'" >> mylist.txt; done
sort -V mylist.txt  > mylist1.txt
ffmpeg -f concat -safe 0 -i mylist1.txt -c copy test.mp4
dirkf commented 2 years ago

In https://hls-vod-cdn.chabad.org/vod/_definst_/smil:smil_cache1/116/11633637.smil/media_b600000_$c.ts?v=21112226, if 11633637 is the second param in new Co.MediaPlayer.MediaInfo(...) and 116 is its first 3 characters, what is 21112226?

Possibly you might be able to pipe the for command with wget -q -O - ... into ffmpeg -f ts -i - ..., so avoiding the intermediate files?

lesshaste commented 2 years ago

At a guess v=21112226 refers to video 21112226? That is v means video.

dirkf commented 2 years ago

Possibly, but this value doesn't seems to be set anywhere in the page JS. And 11633637 appears to be the video ID. Ofc, it's possible that there could be different IDs, say one for the content owner and one for the video hoster.