Smarter library update strategy - do it according to MD's last updated manga

asdkant commented 3 years ago

Why/User Benefit/User Problem

As far as I could glance from app/src/main/java/eu/kanade/tachiyomi/data/library/LibraryUpdateRanker.kt, the "last updated" library update order orders manga by how recently the local copy was updated. This is smarter than doing it alphabetically, but ideally we want to update according to MangaDex's update order

What/Requirements

Neko could grab the list of last updated manga, and filter by what's in the local library. Then match the sequence of when manga was updated with a local cache to know when to stop (a hard stop of some amount of time or manga could be hardcoded so this stop at some point). This method makes it so Neko not only updates just what's new since last time it checked, and it only needs to check a fraction of the manga in the library.

A possible issue could arise if the user changes the categories to be checked when the library updates (this changes the criteria of what manga to update. a flag could be set so that when this changes, Neko lets the user know and offers to update the extra categories (if the user removes categories from the update this is not an issue).

The sequense used to match where to stop looking for new manga should be from the whole list MangaDex provides, not just what's in the library, to avoid issues when the users adds/removes manga or adds/removes categories from the global update.

nonproto commented 3 years ago

It's not a bad idea, the only problem is there is no good way to get latest updates from dex. Would basically need to make like 20+ http calls to get 20 pages of results from dex before you even started sorting

asdkant commented 3 years ago

@CarlosEsco then limit the amount of pages that this method can grab to something proportional to the amount of manga in the library, or a hard limit.

If there's none or an invalid checkpoint cache, grab one page to recreate it then do a full library check (not the other way around, since some update can fall through).

And if the last smart check was too long ago, set a threshold after which the checkpoint cache is invalidated

nonproto commented 3 years ago

Problem is anything more than 1 http call is a bit much to get a sort order.

Caching the results also makes little sense because new chapters are added every minute to the site.

asdkant commented 3 years ago

the idea here is that if I have, say, 200 manga in the library, it's 200 requests to the API that need to be done every time the library is checked.

Let's say we do a check every hour, and on average that's less than 10 pages of new updates and less than 50 updated manga. Then we're talking about 60 requests vs 200 in a very expensive case, and I think my estimate is quite inflated by most standards.

The idea would be that the request for latest updates gets, in an abstract sense, a sequence of manga IDs. you can save the "newest" section of that sequence (say, 20 IDs) and use that to compare the next time, so you know when to stop looking for older updates.

So say you check the latest updates and get: A B C D E F G. Then an hour later you check again and the latest updates are H I J K L A B C D E. So you match the A B C D window, and know that that's where you should stop (with more items to get a better match, but you get the idea). The whole point is to end up doing less http requests, which you end up doing at the cost of maybe a few extra requests and some sub-sequence matching.

Of course in practice this needs to be tested to figure more or less how many updates you are likely to get within several time windows, but I think for larger libraries this is more efficient even with larger intervals between checks.

nonproto commented 3 years ago

I see I misunderstood your original request. I thought you were just getting an update order but still updating all 200.

nonproto commented 3 years ago

I'll look into more for 3.x but have no plans to add something like this for 2.x

kyjibo commented 3 years ago

I've implemented an approach to this and put it in pull request #444

The pitch is simple - manga updates are performed on a consistent enough schedule that we can guess when the next update will occur. You get the current update interval for any manga by looking at how frequently the last few updates have been performed. Then take this update interval and add it to the last updated date and you can get a fairly good idea of when the next chapter will drop.

This can be skewed by bonus chapters and manga with multiple scanlators, but it's still fairly robust and you can increase the range over which you're averaging the update interval in order to reduce sensitivity to irregularities. I used an average of the last 4 updates (so 3 update intervals) to make sure it would still be at least mildly responsive to things like when the scanlator catches up to the raws and the rate drops.

It's not a perfect solution, but it's a solution that requires no additional API calls and does not require running any new background processes to poll for updates. It's also a solution that's close enough to the real thing I don't think people will be able to tell the difference.

nekomangaorg / Neko

Smarter library update strategy - do it according to MD's last updated manga #430

Why/User Benefit/User Problem

What/Requirements