openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
43 stars 26 forks source link

We still get HTTP 421 errors even with videos in the cache #129

Closed kelson42 closed 1 year ago

kelson42 commented 3 years ago

See https://farm.openzim.org/pipeline/5f8aea2fe4deed8b0a64dc29/debug

We need to slow-down somehow the speed at which we requests info from the Youtube API.

rgaudin commented 3 years ago

No this is not related to the API. It's in the youtube-dl request to the page.

kelson42 commented 3 years ago

Why youtube-dl makes requests if the videos are already in the cache?

rgaudin commented 3 years ago

Already explained that in another ticket. As we don't cache the subtitles (because they can change over time), we download those always. To get the list of subtitles, youtube-dl hits that webpage that contains all the info, and which is metered.

One option could be to cache the subtitles ; but we should probably find a appropriate policy for when to cache/expire those.

Another option would be to slow down scraping (ie. sleep in between requests) but we have no data on the rules behind the ban: after how many requests, for how long, etc. I've seen posts of people trying to guess that without success so that looks like a difficult route to follow.

kelson42 commented 3 years ago

@rgaudin We can not get the list of subtitles via the API and handle that ourself? Caching the subtitles seems a good solution to reduce a lot the number of http requests against youtube.com.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 1 year ago

Did not happen the last two years. Looks good for now. Could be reopen if necessary.