ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.3k stars 10.03k forks source link

download videos from https://ici.radio-canada.ca #31338

Open pete1212 opened 2 years ago

pete1212 commented 2 years ago

Checklist

Example URLs

https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962

Description

Hi, I've been trying to download videos from https://ici.radio-canada.ca, and also searching for solution

Download from here:- https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962

cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
pete@pdr:~$ uname -r
5.4.0-131-generic
pete@pdr:~$ youtube-dl -U
youtube-dl is up-to-date (2021.12.17)

Here is the output from youtube-dl when I try:-

pete@pdr:~$ youtube-dl -v https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.8.10 (CPython) - Linux-5.4.0-131-generic-x86_64-with-glibc2.29
[debug] exe versions: ffmpeg 4.3-2, ffprobe 4.3-2
[debug] Proxy map: {}
[generic] youth-special-29-septembre-1962: Requesting header
WARNING: Falling back on generic information extractor.
[generic] youth-special-29-septembre-1962: Downloading webpage
[generic] youth-special-29-septembre-1962: Extracting information
ERROR: Unsupported URL: https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 3489, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962

I have found a solution and wonder if there is anyway for youtube-dl to extract this information:-

In Firefox (106.0.2) in tab with video (https://ici.radio-canada.ca/info/videos/1-8211318/youth-special-29-septembre-1962) I selected (rt click) Inspect (Q) which brings up a diagnostics bar at bottom of page. From this I selected Network

When I start the video I can see traffic as video is uploaded

If I select a line in the output (rt click)->Copy Value->Copy URL I get the address of one packet of the video Here are 3 lines and it is easy to see the packet sequence Having got that I just used wget to get the packages (note the difference in segment number)

https://rcavmedias.akamaized.net/f48e7190-8919-4dfa-9d06-f9cc534a9fc0/2020-01-15_15_07_02_archivesweb_0001/index_3_av/segment4_3_av.ts https://rcavmedias.akamaized.net/f48e7190-8919-4dfa-9d06-f9cc534a9fc0/2020-01-15_15_07_02_archivesweb_0001/index_3_av/segment5_3_av.ts https://rcavmedias.akamaized.net/f48e7190-8919-4dfa-9d06-f9cc534a9fc0/2020-01-15_15_07_02_archivesweb_0001/index_3_av/segment6_3_av.ts

dirkf commented 2 years ago

There is an extractor for CBC but (a) it needs to be brought up to date from the yt-dlp version (b) it doesn't handle this page type.

The media details aren't directly available within the non-JS page that yt-dl sees. So we need to trace through to see where the fragment URLs like those you list come from.

If you look at the network tab after reloading the page you should see data of type XHR being fetched by the browser. In the Response tab that appears when you select such a line you may find interesting JSON. If that includes media links, the Request headers and any parameters will also be interesting.