ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.62k stars 10.05k forks source link

youtube-dl is back #27013

Closed dstftw closed 2 years ago

dstftw commented 4 years ago

As everybody already knows our dev repository has been reinstated earlier today. You can read the full story here.

We would like to thank @github for standing up for youtube-dl and making it possible to continue development without dropping any features. We appreciate @github for taking potential legal risks in this regard.

We would also like to thank @EFForg and personally @Mitch77 for invaluable legal help.

We would also like to heartily thank our main website hoster Uberspace who is currently being sued in Germany for hosting our essentially business card website and who have already spent thousands of Euros in their legal defense.

We also appreciate massive amount of support received lately and we are sorry we could not physically respond to everybody.

Finally, we would like to thank all youtube-dl users and contributors for using and improving youtube-dl.

Thank you all.

youtube-dl is back!

Whanos commented 4 years ago

Long live the king!

Mhowser commented 4 years ago

We should look into backing up all the pull requests and issues in case this happens again.

leetfin commented 4 years ago

Three cheers for youtube-dl! Glad it's back, RIAA can go sit in a hole.

Hans5958 commented 4 years ago

pog

gg, riaa can suck ass

Morsmalleo commented 4 years ago

Hip hip hooray for the return of Youtube-dl

ucodelukas commented 4 years ago

Whoohoo

Qix- commented 4 years ago

I've never used youtube-dl personally but I've been following this closely. The community over at HackerNews has invested a lot of effort tracking this as well.

Thank you to the fine folks at EFF for sticking up for us. While I'm not personally the biggest fan of Github for unrelated reasons, this was a good act of showmanship on their part, too.

Thank you for pushing the boundaries of freedom of expression and the exercise of open technology. I'm glad to see you back.

laoshuterry commented 4 years ago

Cheers!!

Lightsockie commented 4 years ago

woot

iamflorencejay commented 4 years ago

WELCOME BACK BOIZZZZ

FrozenHearth commented 4 years ago

Welcome back!

angentanewbe commented 4 years ago

Welcome Back. Keep youtube-dl alive :D

joppan commented 4 years ago

Welcome back!

koutsie commented 4 years ago

aye :clap:

baddate commented 4 years ago

Nice 🎉

aeongdesu commented 4 years ago

Welcome back! 🚀🎉 really cool project

SausageTaste commented 4 years ago

This program can be used for public benefits. No one can take it down ever again!

CyberTailor commented 4 years ago

Do you consider switching to a self-hosted git server to prevent any possible takedowns?

SausageTaste commented 4 years ago

Do you consider switching to a self-hosted git server to prevent any possible takedowns?

https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/ Read this. GitHub is pretty serious about protecting open source projects developers.

misaka00251 commented 4 years ago

yay

viliml commented 4 years ago

Why not just move to a different git server? GitHub is shit anyway.

blackjyn commented 4 years ago

Congrats guys! I just opened my email and got notification on this. So glad this repo is reinstated again

qwertyuiop6 commented 4 years ago

🎉🎉🎉

TimurC1 commented 4 years ago

Congratulations! You need an official mirror to prevent new problems in future.

jalotra commented 4 years ago

Github played well.

thr3a commented 4 years ago

okaeri !!!

Anon-Exploiter commented 4 years ago

Youtube-dl -> RCIA

image

ESWZY commented 4 years ago

WELCOME BACK!

jonasknobloch commented 4 years ago

We should look into backing up all the pull requests and issues in case this happens again.

From https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/:

Even after a repository has been taken down due to what appears to be a valid claim, we will ensure that repository owners can export their issues and PRs and other repository data that do not contain the alleged circumvention code, where legally possible.

mrx23dot commented 4 years ago

To prevent further take downs maybe new rule NOT to use copyrighted test cases?

TimurC1 commented 4 years ago

We should look into backing up all the pull requests and issues in case this happens again.

From https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/:

Even after a repository has been taken down due to what appears to be a valid claim, we will ensure that repository owners can export their issues and PRs and other repository data that do not contain the alleged circumvention code, where legally possible.

But contributors should have a backup of code with issues and pull requests, because rules of Github may change suddenly.

mrx23dot commented 4 years ago

I would say we should automate the merging of PRs if it's only a site-support and not core change. Worst case one site will fail. There are outstanding PRs dating 1year back. Then we can backup the remaining issues.

pcislocked commented 4 years ago

I used youtube-dl to archive channels that at the risk of being taken down either by the Turkish government, or youtube, or by 3rd parties with Content ID abuse. It is an amazing tool. Glad it's back.

s-gbz commented 4 years ago

So glad to see you back! Thanks @github

CatPlanet commented 4 years ago

Welcome back!

pedro2555 commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

What a waste of everybody's time.

SurendiranS commented 4 years ago

Happiest News of the day

rautamiekka commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.

viliml commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.

You make it sound like the removing the tests wasn't even necessary.

pedro2555 commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered. That's 1 point GitHub/EFF made which RIAA described wrong.

Well, precisely, that is what I said.

pedro2555 commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.

You make it sound like the removing the tests wasn't even necessary.

And it wasn't, that was part of the reason for reinstating. The test didn't even download the whole thing (which I believe was his point, but rather enough to pass the test), even if it did it is fair use as you do the same to see if YouTube is working.

pedro2555 commented 4 years ago

RIAA throw the clay at the wall, it didn't stick.

rautamiekka commented 4 years ago

So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?

No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.

You make it sound like the removing the tests wasn't even necessary.

If the PDF is to be believed, it absolutely sounded like RIAA threw the DMCA just to show off. In other words they either knew they had no chance at winning but wanted to flex, or they had no idea and thought this'd be an A-B-C easy win.

MasterBrian99 commented 4 years ago

Finally inner peace.... 😌😌

blueray453 commented 4 years ago

This has been hello of a journey. I learned to do scraping using Python because I thought this is how it is going to be from now on. While trying out, I had an idea, I thought that it will be better if I could use selenium to download videos so that it trigger less bot detection. I was trying to automatically use the selenium developer tool (using selenium) to download videos so that it resembles human more closely.

The solution I came so far is:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import json
import requests
import time

caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}

options = webdriver.ChromeOptions()
# options.add_argument("--window-size=1920,1080")
# options.add_argument('headless')
driver = webdriver.Chrome(desired_capabilities=caps, options=options)
driver.get('https://twitch.tv')
time.sleep(5)
driver.find_element_by_id("play-button").Click()

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

while True:
    browser_log = driver.get_log('performance')
    events = [process_browser_log_entry(entry) for entry in browser_log]
    events = [event for event in events if 'Network.response' in event['method']]

for e in events:
    if e['params']['response']['url'].endswith('.ts'):
        url = e['params']['response']['url']
        r1 = requests.get(url, stream=True)
        if(r1.status_code == 200):
            with open('testvod.mpeg', 'ab') as f:
                for chunk in r1.iter_content(chunk_size=1024):
                    f.write(chunk)
        else:
            print("Received unexpected status code {}".format(r1.status_code))

driver.quit()

As opposed to:

import m3u8
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm_notebook as tqdm
import subprocess

sess = requests.Session()
r = sess.get("https://www.iplt20.com/video/144829/final-csk-vs-srh-fbb-stylish-player-of-the-match-lungi-ngidi")
soup = BeautifulSoup(r.content, 'html5lib')
video_id = soup.find('video', attrs={'id': 'playlistPlayer'})['data-video-id']
account_id = soup.find('video', attrs={'id': 'playlistPlayer'})['data-account']

url = "https://secure.brightcove.com/services/mobile/streaming/index/master.m3u8"

params = {
    'videoId': video_id,
    'pubId': account_id,
    'secure': True
}

r = sess.get(url, params=params)

m3u8_master = m3u8.loads(r.text)
m3u8_playlist_uris = [playlist['uri'] for playlist in m3u8_master.data['playlists']]
playlist_uri = m3u8_playlist_uris[0]

r = sess.get(playlist_uri)
playlist = m3u8.loads(r.text)
m3u8_segment_uris = [segment['uri'] for segment in playlist.data['segments']]

with open("video.ts", 'wb') as f:
    for segment_uri in tqdm(m3u8_segment_uris):
        r = sess.get(segment_uri)
        f.write(r.content)

subprocess.run(['ffmpeg', '-i', 'video.ts', 'video.mp4'])

Few things that will be nice to have is:

  1. if they ask for captcha I want to pause and manually solve that. If I am not there to solve it (after three tries) it will stop the process.
  2. It will click here and there, scroll up and down to emulate human behaviour.
rautamiekka commented 4 years ago

This has been hello of a journey. I learned to do scraping using Python because I thought this is how it is going to be from now on. While trying out, I had an idea, I thought that it will be better if I could use selenium to download videos so that it trigger less bot detection. I was trying to automatically use the selenium developer tool (using selenium) to download videos so that it resembles human more closely.

youtube-dl has made the point numerous times they wanna minimize non-stdlib requirements.

But even if Selenium was used, it needs a massive rewrite of the infrastructure. I'd say borderline complete rewrite.

^ And that's not even counting that for consistency all of the extractors would need to be rewritten for Selenium (although that wouldn't be necessary functionality-wise). That's a rather extreme task, as I've seen the count of extractors, let alone the code.

Also, Selenium has features to not need mechanical, horrifyingly unreliable, waiting like you did in the example.

For the future, Selenium would indeed be the way to go, though.

blueray453 commented 4 years ago

@rautamiekka some sites are creating huge problem now a days. For example, I had a post on quora answering Tutorials: How can I download PluralSight training videos? A lot of people are commenting that they have been banned, their account had been blocked due to high bot scores etc.

If we could come up with a single page script (not talking about rewrite, maybe a template) using selenium, then I could give it to them. I think if someone modifies the sample I have provided then it will be working for those sites.

Just to be clear, we are not looking for a universal solution, but a solution which we can tweak with the trouble making sites. For example: If we could get a working script (with --batch-file, --cookies, --min-sleep-interval, --max-sleep-interval, --abort-on-error, -f, -o ) that download pluralsight courses using selenium then we could modify that script for other sites (and also can try different bot bypass strategies with that script).

rautamiekka commented 4 years ago

^ Hmm, very true.

Never heard of someone getting banned over at PluralSight (pretty much nowhere for that matter, apart from Fb being extremely fidgety, as many reports here has proved; even I've gotten soft-blocked for reasons that couldn't be true: despite my acc having all the 2FA's enabled, I were repeatedly soft-blocked cuz my acc allegedly was abusing things like their graphs API which I had never used, nor never used yt-dl on Fb. Gladly it stopped eventually. I think it was Fb's systems fucking up) for downloaders, but I'd be lying if I said it's a surprise.

cypheron commented 4 years ago

This is the best news I heard all day! I'm very happy to see Microsoft/GitHub change course and siding with opensource devs!

acagastya commented 4 years ago

Bring out the crabs

🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀