ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.58k stars 10.05k forks source link

video on abc.net.au fail #30887

Open whatusernameisavailablethen opened 2 years ago

whatusernameisavailablethen commented 2 years ago

Checklist

Verbose log

C:\Users\test node\Downloads>youtube-dl --verbose https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population
-unsustainable/13823162
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population-unsusta
inable/13823162']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.4.4 (CPython) - Windows-8.1-6.3.9600
[debug] exe versions: none
[debug] Proxy map: {}
[[abc.net.au](http://abc.net.au/)] 13823162: Downloading webpage
ERROR: Unable to extract video urls; please report this issue on https://yt-dl.org/bug . Make sure you are using the lat
est version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete
 output.
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\[YoutubeDL.py](http://youtubedl.py/)", line
815, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\[YoutubeDL.py](http://youtubedl.py/)", line
836, in __extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\extractor\[common.py](http://common.py/)"
, line 534, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\extractor\[abc.py](http://abc.py/)", l
ine 74, in _real_extract
youtube_dl.utils.ExtractorError: Unable to extract video urls; please report this issue on https://yt-dl.org/bug . Make
sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose fla
g and include its complete output.

Description

WRITE DESCRIPTION HERE news video on abc.net.au could not be downloaded

dirkf commented 2 years ago

Your link says Sorry, Page Not Found.

This link https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population-unsustainable/13823162 has a video, but the ABC extractor can't find it. However you can download it directly from the browser, or use --force-generic-extractor:


$ youtube-dl -F --verbose --force-generic-extractor 'https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population-unsustainable/13823162'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-F', '--verbose', 'https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population-unsustainable/13823162', '--force-generic-extractor']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.5.2 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] 13823162: Requesting header
WARNING: Forcing on generic information extractor.
[generic] 13823162: Downloading webpage
[generic] 13823162: Extracting information
[download] Downloading playlist: 1976 ecologist warns population unsustainable
[generic] playlist 1976 ecologist warns population unsustainable: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[info] Available formats for 13823162:
format code  extension  resolution note
0            mp4        unknown    
[download] Finished downloading playlist: 1976 ecologist warns population unsustainable
$
dirkf commented 2 years ago

The targets of the existing ABC extractor aren't in the page.

The extractor should extract the hydration JSON assigned to window.__API__. Then the video details are found in the .document.loaders.media.featureMediaPrepared.heroContent.props.document member, and the formats are at .media.video.renditions.files in that object.

Someone could survey the video pages at abc.net.au/news to see whether other page structures are in use.

whatusernameisavailablethen commented 2 years ago

Thank you for the swift response. The addresses are the same to my eye. I couldn't find a file put out by the downloading. It looks like it downloaded the playlist only.

dirkf commented 2 years ago

--force-generic-extractor -g gives https://abcmedia.akamaized.net/news/video/202204/RFa_ConfrontingFuture_0104_1000k.mp4.

whatusernameisavailablethen commented 2 years ago

C:\Users\test node\Downloads>youtube-dl -F --verbose --force-generic-extractor -g https://www.abc.net.au/news/2022-04-22 /1976-ecologist-warns-population-unsustainable/13823162 [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['-F', '--verbose', '--force-generic-extractor', '-g', 'https://www.abc.net.au/news/2022-04-2 2/1976-ecologist-warns-population-unsustainable/13823162'] [debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252 [debug] youtube-dl version 2021.06.06 [debug] Python version 3.4.4 (CPython) - Windows-8.1-6.3.9600 [debug] exe versions: none [debug] Proxy map: {} WARNING: Forcing on generic information extractor.

dirkf commented 2 years ago

By saying -F you're asking for the available media formats to be listed (eg to identify the format that you might want if yt-dl's default selection is not it), and not downloaded.

gamer191 commented 2 years ago

Someone could survey the video pages at abc.net.au/news to see whether other page structures are in use.

I use that website a lot. There are 6 types of page layouts I know of that have videos in them: type 1: video pages, such as https://www.abc.net.au/news/2022-04-22/1976-ecologist-warns-population-unsustainable/13823162 type 2: articles with videos at the top, such as https://www.abc.net.au/news/2022-01-13/what-is-best-face-mask-for-covid-19/100751758. These videos are also available as type 1 pages, which can be found by googling the name, such as https://www.abc.net.au/news/2022-01-12/best-mask-for-omicron-variant/13703184 type 3: regular articles, such as https://www.abc.net.au/news/2022-04-23/wa-premiers-child-released-from-hospital-covid-19/101011024. These sometimes contain embedded content from other websites, such as Twitter. These almost always contain a video at the bottom, but we can't add tests for those videos, as they change all the time, suggesting that abc has specific videos for specific keywords (such as Covid, vaccine, West Australia etc) and the article is set up to simply show the latest video for a certain keyword type 4: live blogs, such as https://www.abc.net.au/news/2022-04-23/ukraine-russia-war-live-putin/101010708. These could potentially have videos or embeds hidden under the "show more posts" button, at the bottom of the screen. These also often have the "type 5" livestream embedded at the top. type 5: https://www.abc.net.au/news/newschannel/. This one page is just a livestream of the abc news channel. It has buttons underneath the livestream for other videos, however they won't go to other pages, so if someone clicked one of those videos and then copied the link, they would copy https://www.abc.net.au/news/newschannel/. The workaround if you want to download one of those videos is to click the "copy link" button under "share" type 6: old articles, such as the youtube-dl test videos. These videos have often, potentially always, expired. One of the Youtube-dl test videos gives a "video unavailable" error, which can be fixed by opening inspect element and taking "&list=" out of the embedded link. These articles use a different layout, however there are probably at least 5-10 different layouts that they could potentially be using. In my opinion it's not worth it to add support for them, unless someone requests it, but at the same time reworking youtube-dl's abc.net.au extractor without taking these videos into account would arguably be a regression.

You'll notice that all these articles (except "type 5") have dates in them. That is the case most (but not all) the time. Potentially that could make it easier to avoid breaking type 6 videos, as youtube-dl could have a regex that looks for the date

PS: this issue is a duplicate of https://github.com/ytdl-org/youtube-dl/issues/28751, but in my opinion that one should be closed and marked as a duplicate, whilst this should stay open

dirkf commented 2 years ago

Type 2: featured video at the .document.loaders.articledetail.featureMediaPrepared.heroContent.descriptor.props.document member, formats again at .media.video.renditions.files in that object. Other video objects with the same format path are nested at various levels under .document.loaders.articledetail.text.children.

Type 3: looks like Type 2 without the featured video.

gamer191 commented 2 years ago

UPDATE: I missed two of the test videos, whoops. The first 2 test videos are type 6, whilst the fourth is type 1. The third one is completely different (lets call it type 7), because it's a redirect to a page that doesn't fit the regex.

CC @dirkf

gamer191 commented 2 years ago

https://github.com/ytdl-org/youtube-dl/pull/27850 is related, although I don't know if it would break any type 6 videos (I haven't looked at the code). It was merged into yt-dlp here: https://github.com/yt-dlp/yt-dlp/commit/a74727e93cb245979c566f58de8ef864f601a5f1.

I just tested all the videos on youtube-dl and yt-dlp (which has a virtually identical abc extractor, apart from that PR): Types 1-4 and 8 (see below) can't be downloaded in youtube-dl or yt-dlp Type 5 works in both, because it falls back to the generic extractor Type 6: http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334 works in neither (but accurately returns that the video has expired) http://www.abc.net.au/news/2015-08-17/warren-entsch-introduces-same-sex-marriage-bill/6702326 works only in yt-dlp type 7 works only in yt-dlp

gamer191 commented 2 years ago

I just remembered about the ABC news story lab, which should be supported if youtube-dl is aiming to download background videos here's an example: https://www.abc.net.au/news/2022-03-24/mexico-femicide-units-violent-crime-women-foreign-correspondent/100920716 I will call this "type 8"

Theoretically these could also contain embeds, I guess

gamer191 commented 2 years ago

https://github.com/ytdl-org/youtube-dl/issues/10687 is a duplicate. Also, based on that, https://github.com/ytdl-org/youtube-dl/pull/24915 could potentially be helpful with some cases of type 3 (it's a workaround though, and it obviously won't catch twitter embeds etc)

I don't have time to test it though

dirkf commented 2 years ago

24915 is a bit antique. If the YT extractor doesn't have a routine for extracting embedded YT videos (eg for the generic extractor), it should have one.