ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.47k stars 9.96k forks source link

Issue when downloading stream from la7.it #31393

Open giusmos opened 1 year ago

giusmos commented 1 year ago

Hello, when trying to download the following documentary from la7.it I get the error below. I am however able to download other streams from la7.it. How can I fix this issue?

youtube-dl --verbose --no-playlist -o "%(title)s.%(ext)s" https://www.la7.it/atlantide/rivedila7/atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379

[debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['--verbose', '--no-playlist', '-o', '%(title)s.%(ext)s', 'https://www.la7.it/atlantide/rivedila7/atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.10.6 (CPython) - Linux-6.0.6-76060006-generic-x86_64-with-glibc2.35 [debug] exe versions: none [debug] Proxy map: {} [la7.it] atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379: Downloading webpage ERROR: Unable to extract video_path; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "/usr/local/lib/python3.10/dist-packages/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "/usr/local/lib/python3.10/dist-packages/youtube_dl/extractor/la7.py", line 81, in _real_extract video_path = self._search_regex(r'(/content/\S+?).mp4', webpage, 'video_path') File "/usr/local/lib/python3.10/dist-packages/youtube_dl/extractor/common.py", line 1012, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) youtube_dl.utils.RegexNotFoundError: Unable to extract video_path; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

giusmos commented 1 year ago

I can add that if I try to download the link below it works:

https://www.la7.it/dimartedi/rivedila7/dimartedi-30-11-2022-462229

I have tried to compare the variable "webpage" in the la7.py extractor file between the link that works and the link that doesn't work. In the link that works, I can find this (which I think has the data that the script is looking for):

src: {"m3u8" : "http://la7-vh.akamaihd.net/i/,/content/entry/data/0/537/0_p881dq0q_0_60gslmxm_1,/content/entry/data/0/537/0_p881dq0q_0_uu0xfins_1,.mp4.csmil/master.m3u8","mp4" : "https://awsvodpkg.iltrovatore.it/content/entry/data/0/537/0_p881dq0q_0_60gslmxm_1.mp4","f4m" : "http://la7-vh.akamaihd.net/z/content/entry/data/0/537/0_p881dq0q_0_uu0xfins_1.mp4/manifest.f4m?hdcore=3.1"},

In the link that doesn't work I cannot find a similar structure. I can only find one line that has "mp4" in it:

window.iosUrl = "https://d6tz2b13nnqzk.cloudfront.net/Atlantide_20221130212400.mp4?Expires=1670025543&Signature=aRAuDxnB....

dirkf commented 1 year ago

It looks like the Atlantide show is DRM-protected, while Puntata isn't.

In any case the new yt-dlp LA7 extractor needs to be back-ported.

Vangelis66 commented 1 year ago

It looks like the Atlantide show is DRM-protected

Unfortunately, yes 😞 ; the cenc MPEG-DASH manifest is:

https://d3iki3eydrtvsa.cloudfront.net/Atlantide_20221130212400/DASH/Atlantide_20221130212400.mpd

I tried spoofing a mobile UA, got served below manifest:

https://d3iki3eydrtvsa.cloudfront.net/Atlantide_20221130212400/HLS/Atlantide_20221130212400.m3u8

which, promisingly, generates:

youtube-dl -vF "https://d3iki3eydrtvsa.cloudfront.net/Atlantide_20221130212400/HLS/Atlantide_20221130212400.m3u8" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-vF', 'https://d3iki3eydrtvsa.cloudfront.net/Atlantide_20221130212400/HLS/Atlantide_20221130212400.m3u8']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2022.12.10.40298
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg 5.0, ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] Atlantide_20221130212400: Requesting header
[generic] Atlantide_20221130212400: Downloading m3u8 information
[info] Available formats for Atlantide_20221130212400:
format code  extension  resolution note
281          mp4        416x234     281k , avc1.4d400d, 25.0fps, mp4a.40.2
490          mp4        480x270     490k , avc1.4d4015, 25.0fps, mp4a.40.2
1144         mp4        768x432    1144k , avc1.77.30, 25.0fps, mp4a.40.2
2585         mp4        1280x720   2585k , avc1.4d401f, 25.0fps, mp4a.40.2 (best)

... however, the variant manifests are ALL tainted with AppleFairPlay DRM...

yt-dlp "https://d3iki3eydrtvsa.cloudfront.net/Atlantide_20221130212400/HLS/Atlantide_20221130212400_720.m3u8" => 

[generic] Atlantide_20221130212400_720: Downloading webpage
[generic] Atlantide_20221130212400_720: Downloading m3u8 information
ERROR: [generic] Atlantide_20221130212400_720: This video is DRM protected

(Google's) DRM is quickly spreading everywhere, like aggressive metastatic cancer 😡 ...

while Puntata isn't

😄 ; puntata = episode; the show's name is "~in~diMartedi" ("on Tuesday" ? ); this non-DRM programme has its manifests served from Amazon,awsvodpkg.iltrovatore.it, rather than CloudFront:

https://awsvodpkg.iltrovatore.it/local/dash//,/content/entry/data/0/537/0_p881dq0q_0_60gslmxm_1,/content/entry/data/0/537/0_p881dq0q_0_uu0xfins_1,.mp4.urlset/manifest.mpd
Vangelis66 commented 1 year ago

if I try to download the link below, it works:

https://www.la7.it/dimartedi/rivedila7/dimartedi-30-11-2022-462229

Truth be told, I was a bit perplexed by OP's claim, because when I tried la7.py to be found in the master branch, I failed to initiate a download... Then, after more thorough searching, I realised OP must be using the la7.py version from here, further patched according to this ...

FWIW, and as pointed elsewhere, the patches thus far can only grab the 360p variant for the diMartedi (non-DRM) episode:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-vF', 'https://www.la7.it/dimartedi/rivedila7/dimartedi-30-11-2022-462229']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2022.12.11.40298
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg 5.0, ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[la7.it] dimartedi-30-11-2022-462229: Downloading webpage
[la7.it] dimartedi-30-11-2022-462229: Downloading MPD manifest
[la7.it] dimartedi-30-11-2022-462229: Downloading m3u8 information
[la7.it] /content/entry/data/0/537/0_p881dq0q_0_60gslmxm_1: Check filesize
[info] Available formats for dimartedi-30-11-2022-462229:
format code  extension  resolution note
dash-a1-x3   m4a        audio only DASH audio   63k , m4a_dash container, mp4a.40.2 (44100Hz)
dash-v1-x3   mp4        640x360    DASH video  599k , mp4_dash container, avc1.42c01e, 25fps, video only
hls-663      mp4        640x360     663k , avc1.42c01e, 25.0fps, mp4a.40.2
https-663    mp4        640x360     663k , avc1.42c01e, 25.0fps, mp4a.40.2, ~890.11MiB (best)

So yes,

the new yt-dlp LA7 extractor needs to be back-ported.

FYI, that has been now merged in (yt-dlp) master, result:

[debug] Command-line config: ['--ffmpeg-location', '..', '--downloader-args', 'ffmpeg:-v 8 -stats', '-vF', 'https://www.la7.it/dimartedi/rivedila7/dimartedi-30-11-2022-462229']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] ytdl-patched/yt-dlp version 2022.12.11.810 [ee7750b] (win_x86_exe)
[debug] Python 3.7.9 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 1.1.1g  21 Apr 2020)
[debug] exe versions: ffmpeg 5.0 (fdk,setts), ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.16.0, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1735 extractors
[la7.it] Extracting URL: https://www.la7.it/dimartedi/rivedila7/dimartedi-30-11-2022-462229
[la7.it] dimartedi-30-11-2022-462229: Downloading webpage
[la7.it] dimartedi-30-11-2022-462229: Downloading MPD manifest
[la7.it] dimartedi-30-11-2022-462229: Downloading m3u8 information
[la7.it] /content/entry/data/0/537/0_p881dq0q_0_60gslmxm_1: Check filesize
[la7.it] /content/entry/data/0/537/0_p881dq0q_0_uu0xfins_1: Check filesize
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7),vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for dimartedi-30-11-2022-462229:
ID            EXT RESOLUTION FPS |   FILESIZE   TBR PROTO | VCODEC        VBR ACODEC      ABR ASR MORE INFO
----------------------------------------------------------------------------------------------------------------------
dash-f1-a1-x3 m4a audio only     |              64k dash  | audio only        mp4a.40.2   64k 44k DASH audio, m4a_dash
dash-f2-a1-x3 m4a audio only     |             128k dash  | audio only        mp4a.40.2  128k 48k DASH audio, m4a_dash
dash-f1-v1-x3 mp4 640x360     25 |             599k dash  | avc1.42c01e  599k video only          DASH video, mp4_dash
hls-663       mp4 640x360     25 |             663k m3u8  | avc1.42c01e  663k mp4a.40.2    0k
https-663     mp4 640x360     25 | ~890.11MiB  663k https | avc1.42c01e  663k mp4a.40.2    0k
dash-f2-v1-x3 mp4 1280x720    25 |            1209k dash  | avc1.64001f 1209k video only          DASH video, mp4_dash
hls-1337      mp4 1280x720    25 |            1337k m3u8  | avc1.64001f 1337k mp4a.40.2    0k
https-1337    mp4 1280x720    25 | ~  1.75GiB 1337k https | avc1.64001f 1337k mp4a.40.2    0k

... and:

yt-dlp -F "https://www.la7.it/atlantide/rivedila7/atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379" => 

[la7.it] atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379: Downloading webpage
ERROR: [la7.it] atlantide-hitler-1922-marcia-su-berlino-01-12-2022-462379: This video is DRM protected
giusmos commented 1 year ago

Yes, you are correct, I have modified the la7.py extractor as guided by dirkf. It's a pity that it's not possible to download streams of Atlantide as they make really nice documentaries. Having a quite limited data plan in my phone it would be very convenient to be able to download them at home and watch them while commuting.

Just a small correction, you mentioned "inMartedi" but actually the correct name is "DiMartedi" :)

Vangelis66 commented 1 year ago

It's a pity that it's not possible to download streams of Atlantide as they make really nice documentaries. Having a quite limited data plan in my phone, it would be very convenient to be able to download them at home and watch them while commuting.

It's all because of d*mn DRM, which is spreading like a plague 😡 ; it's being used mostly to satisfy the studios' demands, as a piracy deterrent, but along the way it impedes legitimate use cases like yours 😉 ... DRM isn't a "deal breaker" for the "real" pirates, so the channels' reason for implementing it becomes moot... If you as a viewer of la7 can exert any leverage whatsoever on them, be vocal about it and let them know how dissatisfied you are about them implementing DRM...

you mentioned "inMartedi" but actually the correct name is "DiMartedi"

Thanks, of course it is, as it's to be found in the programme's URL... I corrected it in the one instance I typed it wrong 😄 ...