Closed dstftw closed 2 years ago
Long live the king!
We should look into backing up all the pull requests and issues in case this happens again.
Three cheers for youtube-dl! Glad it's back, RIAA can go sit in a hole.
pog
gg, riaa can suck ass
Hip hip hooray for the return of Youtube-dl
Whoohoo
I've never used youtube-dl personally but I've been following this closely. The community over at HackerNews has invested a lot of effort tracking this as well.
Thank you to the fine folks at EFF for sticking up for us. While I'm not personally the biggest fan of Github for unrelated reasons, this was a good act of showmanship on their part, too.
Thank you for pushing the boundaries of freedom of expression and the exercise of open technology. I'm glad to see you back.
Cheers!!
woot
Welcome back!
Welcome Back. Keep youtube-dl alive :D
Welcome back!
aye :clap:
Nice 🎉
Welcome back! 🚀🎉 really cool project
This program can be used for public benefits. No one can take it down ever again!
Do you consider switching to a self-hosted git server to prevent any possible takedowns?
Do you consider switching to a self-hosted git server to prevent any possible takedowns?
https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/ Read this. GitHub is pretty serious about protecting open source projects developers.
yay
Why not just move to a different git server? GitHub is shit anyway.
Congrats guys! I just opened my email and got notification on this. So glad this repo is reinstated again
🎉🎉🎉
Congratulations! You need an official mirror to prevent new problems in future.
Github played well.
okaeri !!!
Youtube-dl -> RCIA
WELCOME BACK!
We should look into backing up all the pull requests and issues in case this happens again.
From https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/:
Even after a repository has been taken down due to what appears to be a valid claim, we will ensure that repository owners can export their issues and PRs and other repository data that do not contain the alleged circumvention code, where legally possible.
To prevent further take downs maybe new rule NOT to use copyrighted test cases?
We should look into backing up all the pull requests and issues in case this happens again.
From https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/:
Even after a repository has been taken down due to what appears to be a valid claim, we will ensure that repository owners can export their issues and PRs and other repository data that do not contain the alleged circumvention code, where legally possible.
But contributors should have a backup of code with issues and pull requests, because rules of Github may change suddenly.
I would say we should automate the merging of PRs if it's only a site-support and not core change. Worst case one site will fail. There are outstanding PRs dating 1year back. Then we can backup the remaining issues.
I used youtube-dl to archive channels that at the risk of being taken down either by the Turkish government, or youtube, or by 3rd parties with Content ID abuse. It is an amazing tool. Glad it's back.
So glad to see you back! Thanks @github
Welcome back!
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
What a waste of everybody's time.
Happiest News of the day
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.
You make it sound like the removing the tests wasn't even necessary.
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered. That's 1 point GitHub/EFF made which RIAA described wrong.
Well, precisely, that is what I said.
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.
You make it sound like the removing the tests wasn't even necessary.
And it wasn't, that was part of the reason for reinstating. The test didn't even download the whole thing (which I believe was his point, but rather enough to pass the test), even if it did it is fair use as you do the same to see if YouTube is working.
RIAA throw the clay at the wall, it didn't stick.
So this whole fiasco was due to a CI server downloading copyrighted material publicly available on YouTube during tests?
No~t exactly ... Well, during the tests data is downloaded, but only enough that ytdl can verify it did exactly as ordered (I'm pretty sure that data's not even saved to disk during the test, instead it's in RAM and automatically gets dropped when that part of the test is done). That's 1 point GitHub/EFF made which RIAA described wrong.
You make it sound like the removing the tests wasn't even necessary.
If the PDF is to be believed, it absolutely sounded like RIAA threw the DMCA just to show off. In other words they either knew they had no chance at winning but wanted to flex, or they had no idea and thought this'd be an A-B-C easy win.
Finally inner peace.... 😌😌
This has been hello of a journey. I learned to do scraping using Python because I thought this is how it is going to be from now on. While trying out, I had an idea, I thought that it will be better if I could use selenium to download videos so that it trigger less bot detection. I was trying to automatically use the selenium developer tool (using selenium) to download videos so that it resembles human more closely.
The solution I came so far is:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import json
import requests
import time
caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}
options = webdriver.ChromeOptions()
# options.add_argument("--window-size=1920,1080")
# options.add_argument('headless')
driver = webdriver.Chrome(desired_capabilities=caps, options=options)
driver.get('https://twitch.tv')
time.sleep(5)
driver.find_element_by_id("play-button").Click()
def process_browser_log_entry(entry):
response = json.loads(entry['message'])['message']
return response
while True:
browser_log = driver.get_log('performance')
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]
for e in events:
if e['params']['response']['url'].endswith('.ts'):
url = e['params']['response']['url']
r1 = requests.get(url, stream=True)
if(r1.status_code == 200):
with open('testvod.mpeg', 'ab') as f:
for chunk in r1.iter_content(chunk_size=1024):
f.write(chunk)
else:
print("Received unexpected status code {}".format(r1.status_code))
driver.quit()
As opposed to:
import m3u8
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm_notebook as tqdm
import subprocess
sess = requests.Session()
r = sess.get("https://www.iplt20.com/video/144829/final-csk-vs-srh-fbb-stylish-player-of-the-match-lungi-ngidi")
soup = BeautifulSoup(r.content, 'html5lib')
video_id = soup.find('video', attrs={'id': 'playlistPlayer'})['data-video-id']
account_id = soup.find('video', attrs={'id': 'playlistPlayer'})['data-account']
url = "https://secure.brightcove.com/services/mobile/streaming/index/master.m3u8"
params = {
'videoId': video_id,
'pubId': account_id,
'secure': True
}
r = sess.get(url, params=params)
m3u8_master = m3u8.loads(r.text)
m3u8_playlist_uris = [playlist['uri'] for playlist in m3u8_master.data['playlists']]
playlist_uri = m3u8_playlist_uris[0]
r = sess.get(playlist_uri)
playlist = m3u8.loads(r.text)
m3u8_segment_uris = [segment['uri'] for segment in playlist.data['segments']]
with open("video.ts", 'wb') as f:
for segment_uri in tqdm(m3u8_segment_uris):
r = sess.get(segment_uri)
f.write(r.content)
subprocess.run(['ffmpeg', '-i', 'video.ts', 'video.mp4'])
Few things that will be nice to have is:
This has been hello of a journey. I learned to do scraping using Python because I thought this is how it is going to be from now on. While trying out, I had an idea, I thought that it will be better if I could use selenium to download videos so that it trigger less bot detection. I was trying to automatically use the selenium developer tool (using selenium) to download videos so that it resembles human more closely.
youtube-dl has made the point numerous times they wanna minimize non-stdlib requirements.
But even if Selenium was used, it needs a massive rewrite of the infrastructure. I'd say borderline complete rewrite.
^ And that's not even counting that for consistency all of the extractors would need to be rewritten for Selenium (although that wouldn't be necessary functionality-wise). That's a rather extreme task, as I've seen the count of extractors, let alone the code.
Also, Selenium has features to not need mechanical, horrifyingly unreliable, waiting like you did in the example.
For the future, Selenium would indeed be the way to go, though.
@rautamiekka some sites are creating huge problem now a days. For example, I had a post on quora answering Tutorials: How can I download PluralSight training videos? A lot of people are commenting that they have been banned, their account had been blocked due to high bot scores etc.
If we could come up with a single page script (not talking about rewrite, maybe a template) using selenium, then I could give it to them. I think if someone modifies the sample I have provided then it will be working for those sites.
Just to be clear, we are not looking for a universal solution, but a solution which we can tweak with the trouble making sites. For example: If we could get a working script (with --batch-file, --cookies, --min-sleep-interval, --max-sleep-interval, --abort-on-error, -f, -o ) that download pluralsight courses using selenium then we could modify that script for other sites (and also can try different bot bypass strategies with that script).
^ Hmm, very true.
Never heard of someone getting banned over at PluralSight (pretty much nowhere for that matter, apart from Fb being extremely fidgety, as many reports here has proved; even I've gotten soft-blocked for reasons that couldn't be true: despite my acc having all the 2FA's enabled, I were repeatedly soft-blocked cuz my acc allegedly was abusing things like their graphs API which I had never used, nor never used yt-dl on Fb. Gladly it stopped eventually. I think it was Fb's systems fucking up) for downloaders, but I'd be lying if I said it's a surprise.
This is the best news I heard all day! I'm very happy to see Microsoft/GitHub change course and siding with opensource devs!
As everybody already knows our dev repository has been reinstated earlier today. You can read the full story here.
We would like to thank @github for standing up for youtube-dl and making it possible to continue development without dropping any features. We appreciate @github for taking potential legal risks in this regard.
We would also like to thank @EFForg and personally @Mitch77 for invaluable legal help.
We would also like to heartily thank our main website hoster Uberspace who is currently being sued in Germany for hosting our essentially business card website and who have already spent thousands of Euros in their legal defense.
We also appreciate massive amount of support received lately and we are sorry we could not physically respond to everybody.
Finally, we would like to thank all youtube-dl users and contributors for using and improving youtube-dl.
Thank you all.
youtube-dl is back!