ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
129.76k stars 9.78k forks source link

Add support for videos saved with Internet Archive Wayback Machine #13655

Open zayuim opened 6 years ago

zayuim commented 6 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.07.15. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

What is the purpose of your issue?


Heres some examples of URL's saved with wayback: https://web.archive.org/web/20150517183426/https://www.youtube.com/watch?v=V46KYxFav24 https://web.archive.org/web/20071015051038/http://youtube.com:80/watch?v=JdLCEwEFCMU https://web.archive.org/web/20110531000748/http://www.youtube.com/watch?v=bVUnpkvgHaw&gl=US&hl=en&has_verified=1

yan12125 commented 6 years ago

Wayback machine uses a fixed pattern for YouTube videos. Ex: https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/bVUnpkvgHaw

zayuim commented 6 years ago

This is very useful information. You just take anything after the watch?v= and put it on the end of "https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/".

I've just done this for one of the videos and aria2 will download it!

Is it going to be possible to have this in youtube-dl in the future? IDK if it helps but the video format is VP8 and audio is Vorbis.

bato3 commented 6 years ago

webarchive suports also own video style: https://archive.org/embed/TokiWoKakeruShoujo

mmtsuchi commented 6 years ago

Hello. It should be very simple since you have links to directly download medias. They are shown in "Download Options" section on the web page. Links look like "https://archive.org/download/movie_name/movie_name(*).mp4"

ajacraig2011 commented 6 years ago

Request: https://web.archive.org/web/20131005025752/http://www.youtube.com/watch?v=Ehk1OKlMuWY

carestad commented 5 years ago

Has anyone made this work with youtube-dl or any other tool?

jonpatterns commented 5 years ago

There is a way to find a direct link to the video embedded in the Internet Archive Wayback Machine if it's using jw player. From there the video gives the option to download.

Details here:

https://bugthinking.com/how-to-save-a-video-from-jwplayer-website/


At the moment youtube-dl just sees the url and strips it to the Youtube video link, then tries to download from Youtube. This will often fail; as a reason for looking on Wayback is when a video has been removed from Youtube.


@ajacraig2011 The video at https://web.archive.org/web/20131005025752/http://www.youtube.com/watch?v=Ehk1OKlMuWY doesn't appear to be working on Wayback - does it play for you?

NatoBoram commented 4 years ago

Oh, thanks y'all for this knowledge. It allowed me to download Microsoft's anti-open source propaganda video A Few Perspectives on OpenOffice.org.

Blakeinstein commented 4 years ago

Is this fixed? ydl fails to fetch anything from this archive: https://web.archive.org/web/20200225142705/https://www.youtube.com/watch?v=cFK8fhTsViU&gl=US&hl=en

I even tried web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/cFK8fhTsViU

aeiouaeiouaeiouaeiouaeiouaeiou commented 3 years ago

Will this bug ever be fixed? Why can't an exemption be added for the web.archive.org domain?

JonasOlson commented 3 years ago

@aeiouaeiouaeiouaeiouaeiouaeiou:

Why can't an exemption be added for the web.archive.org domain?

From what? Is there some rule in effect?

aeiouaeiouaeiouaeiouaeiouaeiou commented 3 years ago

From what? Is there some rule in effect?

Adding wayback-fakeurl.archive.org/yt to the link sometimes doesn't work and is inconvenient every time as noted above.

Kinegitastwr commented 4 months ago

Is this fixed? ydl fails to fetch anything from this archive: https://web.archive.org/web/20160808170440/https://www.youtube.com/watch?v=i4A9xblvaJY

dirkf commented 4 months ago

Specifically, the master code redirects to the YT page where "This video contains content from Turner EST, who has blocked it on copyright grounds".

The archived page (which is the oldest capture available) looks like just the original YT page re-homed to the archive. With JS enabled, the page warns that the actual video could not be archived.

Extracting from it will only repeat what happens with the original URL (as can be confirmed by adding a URL pattern with the prefix web\.archive\.org/web/\d+/https://www\.youtube\.com to the list of YT domains in the _VALID_URL.