Handling of AniDB bans and 1000+ Anime Series

winterbird-code commented 2 years ago

Background

The AniDB API is heavily rate-limited and will ban any IP that requests more than maybe a few hundred series. Some of us data hoarders have far more series than that, meaning we can never do a full metadata refresh without getting banned.

Problems

I can think of three different scenarios when this causes problems:

During initial creation of a large library (it was horrible)
During full metadata refresh of a large library
When adding a large amount of series or episodes in different series in a short time (a day?)

A bit of brainstorming follows :slightly_smiling_face:

Possible solutions?

When I added my library I manually kept track in the xml cache directory for when I started getting banned messages. At that point I:

Stopped the library scanning
Noted ID:s for the bad xml files and removed them
Waited at least 24 hours
Manually refreshed the metadata for the previously noted ID:s
Restarted then library scan; rinse and repeat

Not the most smooth sailing possible. I guess PR #42 may have helped me to detect the bans which would be an improvement, but unless it also prevents the plugin from making more requests it might also have made the bans longer (I understand the AniDB API adds ban-time for each "unsolicited" request).

A possible high-level solution (which I don't know if it's possible) would be to limit the plugin to maybe 200 requests during a sliding window of 24 hours. When asked for more the plugin should just respond that it currently has no metadata and maybe schedule a new refresh for that series in 24/48/possible more hours. Same if a ban is detected: refuse to send any more requests for at least 24 hours. It will still take very long time for a large library to be refreshed; but it would be populated eventually and without any cumbersome user interaction.

As for library metadata refresh I think there can be some improvements to the xml-file caching. At the moment it looks like the xml files are cached for 7 days. One idea would be to make a daily scheduled task that updates the 50 (or 100 or something) oldest xml files for series that are present in the library and raise the cache time to 30, 60 or 90 days. This would keep the xml files fairly fresh even for larger collections and you wouldn't run in to trouble if you configure the libraries to refresh periodically.

I realize that this is a quite big change, and maybe the problem is rare enough that not many more than I find this troublesome. Unfortunately I cannot help with the code myself, but I'd like to put it up here as an idea, and maybe someone else finds it to be a fun challenge. Anyway; Thank you for an awesome plugin :smiley:

nalsai commented 2 years ago

PR #42 throws an exception when an api error is detected which stops the task and doesn't save the bad xml. Unless you start another library scan or make more requests your ban-time shouldn't increase. You just have to manually start another library scan after you're unbanned and eventually you should have all metadata.

Your proposed solution seems good 👍 When I have time, I might implement the "refuse to send any more requests for at least 24 hours if a ban is detected" part.

I think that xml files are cached for 7 days because most anime release on a weekly schedule -> one episode (/the metadata for the episode) gets added to anidb per week.

winterbird-code commented 2 years ago

Thanks, it would be a great improvement :+1:

7 days (or maybe even 6 days) cache is absolutely the most reasonable for the common use case, but it makes it difficult to maintain the cache to be able to run library refreshes. Before I noticed the 7 days limit I assumed it always used the cache files until they were removed by the jellyfin scheduled task "clear cache", so I made a small python script to refresh the cache a little each day. Unfortunately 7 days is too short time to refresh the entire cache.

The scheduled task idea would help me (it's what I tried to do with the python script), but I'm not sure how much of a corner case it is. Just as you say it would also require some changes to the cache logic, such as refreshing "early" if a requested episode is missing from the cache. For now I'll just have to accept that an anime library cannot be metadata-refreshed. It's not optimal, but it's not critical either.

jellyfin / jellyfin-plugin-anidb