Closed ldexterldesign closed 1 week ago
π https://github.com/yt-dlp/yt-dlp/issues/11378 - crossposted π
Answers will depend on which version of the program you're using, since yt-dlp has capabilities that are not implemented here. I'll leave this open for now on the assumption that even if you are probably going to be using yt-dlp, answers might still be relevant here.
First you need a channel URL of the format https://www.youtube.com/channel/UCwhatever/videos
or https://www.youtube.com/user/whatever/videos
Then you run this command to get the channel's videos in order in a text file:
youtube-dl --flat-playlist --playlist-reverse --get-id https://www.youtube.com/user/whatever/videos > videolist.txt
Then you run this command to get the first fifty videos in the text file:
youtube-dl --download-archive archive.txt --max-downloads 50 -a videolist.txt
And then you run it again when you want the next fifty. It will store the downloaded videos into a file called archive.txt, so it won't download them for a second time.
However, there is one issue with the above method: the download-archive switch does not short-circuit the logic quickly enough. When you have 150 videos downloaded and run it again to get the next fifty, it will make 150 HTTP requests to youtube, solve 150 n-sigs, and so on before it decides that it doesn't actually need to download these files... So at least with my computer it would spend 30 minutes doing nothing before it actually downloads a single file in this case...
The download archive is a list of IE name and video (media) ID for each item that has been downloaded with that archive specified.
The problem is that the extractor API is designed as a one-shot call, where the core finds the appropriate extractor class, instantiates it, passes the URL to the extractor and gets any resulting info_dict back. The core program doesn't have the ID and IE pair needed to index into the archive until the extractor returns.
yt-dlp solves this partially (probably, as much as possible) by adding class method get_temp_id()
to the extractor API, which by default gets the id
group from matching the URL against the _VALID_URL
. If an ID is returned by this method, the download archive can be (and is) checked before even instantiating the selected extractor. This is actually a fairly simple change that we could back-port with no impact on extractors. PR #17975 could be resuscitated and included as well. Then the limitation for large archives/playlists would be the fact that each check re-reads the archive, when it could have been cached.
π @wellsyw,
--download-archive
is an excellent solution
Thank you!
Checklist
Question
WRITE QUESTION HERE
π all,
Hope y'all OK!
I want to watch some YT channels from the start (i.e. chronologically). Some channels started 10 years ago so it's not practical to use the UI for this. It's also not practical to download 10 years worth of content when I could grab a small batch (e.g. 10 videos, 1GB, 2014-5) and repeat.
After a few hours researching and experimenting I tried various combinations of
--playlist-start/end
and--playlist-reverse
(seems to be a lot of people complaining combining these flags is problematic π€·) and then resorting back to --date-before/after - neither successful for my use case π’I suppose this issue is the closest I've got with research - technically a command with
--date-before/after
will work but, to be frank, there's zero chance I'm gonna wait for the command to call the YT API hundreds of times before the new batch of videos I want downloads π₯±Examples/suggestions/tips appreciated..?
Yours hopefully