ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.75k stars 10.07k forks source link

[media.un.org] New site support #30612

Open pascaldaniela opened 2 years ago

pascaldaniela commented 2 years ago

This site uses kaltura, but it is not recognized by youtube-dl. Could it be added? Thank you!

$youtube-dl -v https://media.un.org/en/asset/k1r/k1r3vy9ikk [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['-v', 'https://media.un.org/en/asset/k1r/k1r3vy9ikk'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.8.5 (CPython) - Linux-5.15.0-3-amd64-x86_64-with-glibc2.10 [debug] exe versions: avconv 4.4.1, avprobe 4.4.1, ffmpeg 4.4.1, ffprobe 4.4.1, phantomjs 2.1.1, rtmpdump 2.4 [debug] Proxy map: {} [generic] k1r3vy9ikk: Requesting header WARNING: Falling back on generic information extractor. [generic] k1r3vy9ikk: Downloading webpage [generic] k1r3vy9ikk: Extracting information ERROR: Unsupported URL: https://media.un.org/en/asset/k1r/k1r3vy9ikk Traceback (most recent call last): File "./youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "./youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "./youtube-dl/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "./youtube-dl/youtube_dl/extractor/generic.py", line 3489, in _real_extract raise UnsupportedError(url) youtube_dl.utils.UnsupportedError: Unsupported URL: https://media.un.org/en/asset/k1r/k1r3vy9ikk

dirkf commented 2 years ago

Possibly, especially if you complete the template for a Site Support Request.

You could just paste the completed template in this issue.

pascaldaniela commented 2 years ago

Oh, didn't see this, sorry. Here- s the template:

Checklist

Example URLs

https://media.un.org/en/asset/k1r/k1r3vy9ikk https://media.un.org/en/asset/k1r/k1r5enfiwb

Description

This is the main United Nations Media Site. It contains video and audio recordings of UN meetings and events. The download for personal use seems to infringe no copyright as there is no hi-res material directly available (https://www.unmultimedia.org/avlibrary/content/licensing/) Their old site used to work with youtube-dl, but the new site uses Kaltura and is not recognized by the software.

Thanks for your help!

dirkf commented 2 years ago

Please feel free to test the extractor from the above link, or suggest some more test URLs.

pascaldaniela commented 2 years ago

Hey Dirk, thanks for your quick solution! Your commit worked like a charm for the supplied URL, but this other one spit an error: https://media.un.org/en/asset/k12/k12gpkg3qx

It's from a CEDAW session earlier today that is playing well from the UN website.

$ youtube-dl -v https://media.un.org/en/asset/k12/k12gpkg3qx [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['-v', 'https://media.un.org/en/asset/k12/k12gpkg3qx'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.9.10 (CPython) - Linux-5.15.0-3-amd64-x86_64-with-glibc2.33 [debug] exe versions: avconv 4.4.1, avprobe 4.4.1, ffmpeg 4.4.1, ffprobe 4.4.1, phantomjs 2.1.1, rtmpdump 2.4 [debug] Proxy map: {} [UNO] 1_2gpkg3qx: Downloading webpage [Kaltura] 1_2gpkg3qx: Downloading video info JSON ERROR: An extractor error has occurred. (caused by KeyError('dataUrl')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/kaltura.py", line 298, in _real_extract data_url = info['dataUrl'] KeyError: 'dataUrl' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/kaltura.py", line 298, in _real_extract data_url = info['dataUrl'] KeyError: 'dataUrl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 547, in extract raise ExtractorError('An extractor error has occurred.', cause=e) youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('dataUrl')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

dirkf commented 2 years ago

So, the Kaltura ID isn't always derivable from the ID in the video URL. Instead, let's look at the image URL, like this:

//cfvod.kaltura.com/p/2503451/sp/250345100/thumbnail/entry_id/1_vohfjqkj/width/1200/height/822/type/4

The path component after /entry_id/ is what we want. I've pushed an update that uses this tactic instead.

We could also find the Partner ID, which is the component after /p/, in the same way, as in the TV2DK extractor, but I've left the extraction just using the JS where it's set like this:

var playerConfig = {
  ...
    partnerId: 2503451,
  ...

When these two are known, the existing Kaltura extractor can be invoked as kaltura:2503451:1_vohfjqkj. Then we add in some metadata available in the UN page that isn't found using the Kaltura API.

pascaldaniela commented 2 years ago

I think this did the trick! I tested the new version on various URLs, and they all work fine.

pascaldaniela commented 2 years ago

Am I supposed to close the issue at this point? :)

dirkf commented 2 years ago

Great. GH will close the issue when the linked PR is merged.

pascaldaniela commented 2 years ago

That was really great and super fast, thanks a lot!

N-Tuple commented 2 years ago

Hey Dirk, I stumbled upon this open issue having the same problem as Pascal. So I made a GH account, read a lot of tidbits on howto... Long story short I cloned your version, cd into dir, make, sudo make install, . ~/.bashrc . Then tried again with similar error. I am new at this game so it will probably be some noob mistake on my end but I'm clueless at this point. I'll give you the verbose output with the offending URL: youtube-dl -v https://media.un.org/en/asset/k1p/k1pvngjn8e [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'https://media.un.org/en/asset/k1p/k1pvngjn8e'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 2.7.16 (CPython) - Linux-4.19.0-20-amd64-x86_64-with-MX-19.4-patito_feo [debug] exe versions: ffmpeg 4.1.9-0, ffprobe 4.1.9-0, rtmpdump 2.4 [debug] Proxy map: {} [generic] k1pvngjn8e: Requesting header WARNING: Falling back on generic information extractor. [generic] k1pvngjn8e: Downloading webpage [generic] k1pvngjn8e: Extracting information [download] Downloading playlist: The situation in Ukraine - UN Security Council Arria-formula meeting organized by the Permanent Mission of the Russian Federation [generic] playlist The situation in Ukraine - UN Security Council Arria-formula meeting organized by the Permanent Mission of the Russian Federation: Collected 1 video ids (downloading 1 of them) [download] Downloading video 1 of 1 [Kaltura] 1_pvngjn8e: Downloading video info JSON ERROR: An extractor error has occurred. (caused by KeyError(u'dataUrl',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/kaltura.py", line 298, in _real_extract data_url = info['dataUrl'] KeyError: u'dataUrl' Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 547, in extract raise ExtractorError('An extractor error has occurred.', cause=e) ExtractorError: An extractor error has occurred. (caused by KeyError(u'dataUrl',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Any idea where the problem might be? I'm not a coder but I can manage pretty steep learningcurves given some pointers along the way. Hope you can help or that I can help you to get the issue resolved. All the best. Thomas.