hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
272 stars 25 forks source link

Cache gets dropped #170

Open karibertils opened 2 years ago

karibertils commented 2 years ago

Hello

I have a large library and using warmup takes like 4-6 hours, after it finishes everything is super crazy fast. But after a while it seems like everything gets dropped from the cache, and the speeds go back to how things are without warmup.

Can anything be done to avoid the cache being dropped ?

hasse69 commented 2 years ago

There is nothing that would suddenly just drop the cache. But updates to existing cached directories will invalidate parts of it and depending on what level content is changed this invalidation can be more or less intrusive.

karibertils commented 2 years ago

The way I'm confirming if they are cached. Is by running find /unrar/section, it takes maybe 5-10sec to go through them all. But after the cache drops the find command pauses on each folder and takes much longer to finish.

Say I have mounted /archives as /unrar

/unrar/section/folder1 /unrar/section/folder2 /unrar/section/folder3 /unrar/section/folder4 ... /unrar/section/folder999

These 999 all contain rar archives and are cached. If folder1 was removed, or folder1000 is added. Would that cause the other folders to drop from cache normally ? I don't think there are any other changes being done.

hasse69 commented 2 years ago

Yes, if you change anything in a folder (adding, removing) all it's currently cached information is lost and will have to be refreshed. It is possibly so that the cache invalidation is too aggressive (better-safe-than-sorry approach) but due to other more severe issues currently being investigated this needs to be put on a low priority.

karibertils commented 2 years ago

I see, it's more aggressive approach than I expected.

No problem, this is obviously low priority that can wait. But thanks for clearing up how it's working currently.

hasse69 commented 2 years ago

You need to be aware of the problem we are facing here. The invalidation has no clue exactly what has changed, it only knows that "something" has changed. That means it must be very defensive. If we only allowed changes through the mount point it could be made a bit more clever but since cache can also be invalidated due to external changes it is a lot more work than it might seem. Also having in mind that external changes can never be exactly pin-pointed. External changes are trapped by the modification time stamp having changed, not due to a specific action on the directory.

(Note that the use of the word "never" above is of course not 100% true. It would be possible to conclude exactly what changed but would imply a lot more complexity.)

hasse69 commented 2 years ago

What could possibly be an option here is to check if warmup is enabled and if so restart the background task(s). But I need to look at that some other time. Currently I have no time to spare even on the more severe issues I am afraid.

karibertils commented 2 years ago

I do modifications all the time so the warmup would never stop if it ran auto on modifications. But it would be great if it was possible to restart the warmup background task manually by sending signal to rar2fs.

In my case it's important that changes outside rar2fs are recognized. But after an rar archive has been cached, it might as well be cached permanently based on the filename as key. In my particular case the filenames are always unique strings, but size/timestamp/etc could be added to make it more unique. If an archive with same filename+fields shows up anywhere the decompressed content should always be the same.

Sounds simple on paper, but maybe hard/impossible in practice. Just thinking out loud.

hasse69 commented 2 years ago

I am not so sure about your statement about warmup never completing. It would obviously not start from the top root folder (unless that is what changed) but from the directory that was invalidated.

Cache is as unique it can be. The problem with external changes is that what has changed is not known. For that to be possible you would need something like inotify which is not portable and would also consume an enormous amount of resources since only individual directories can be monitored and not an entire sub-tree.

To be able to tell what in fact changed you need to compare cache towards reality and that is basically what warmup would do as well.

A directory cache entry is not a set of sub-entries, it is an entry with a list of nodes. You cannot just remove or add a node to an entry. If an entry is invalidated so is all it's nodes and thus the list has to be refreshed. I do not see any other option here than to have a warmup doing that for you. The alternative to make individual nodes more dynamic is a much more complex undertaking.

hasse69 commented 2 years ago

Another thing that I guess is not obvious is that the directory cache never cache actual external content. It only caches what is located inside RAR archives. So consider the simple case of an external directory A having an archive containing directory B and C. How would you deal with the case that archive is changed/replaced and suddenly has only directory B? The only safe way to deal with this is to invalidate everything sitting under A. There are so many corner cases to consider and that is what is making all this pretty difficult. The more complexity you add the more corner cases might be overlooked.

hasse69 commented 2 years ago

Since more users seems interested in this topic, would a beta-patch introducing an option to auto-trigger the warmup have any considerable value?

milesbenson commented 2 years ago

Yes, i'm willing to test as my mounts get updated frequently

karibertils commented 2 years ago

Yeah I would like to test it.

milesbenson commented 1 year ago

Its been a while ;-) How to refresh the cache manually without having to remount? Just running a ls -R on the mountfolder?

hasse69 commented 1 year ago

Sadly yes, but unless you know exactly from where in the path your cache got dropped I would say it is faster to remount and rely on the warmup.

hasse69 commented 1 year ago

I guess if you feel adventurous a quick hack in the code could be added in which the warmup never ends and simply restart itself over and over.

milesbenson commented 1 year ago

Well, my cache get dropped when i add new files/folders, so ls -R would be ok. But if you want me to, i can test the other option aswell.

hasse69 commented 1 year ago

The deeper in the path the change is made the less the effect should become. But I think I have mentioned this before somewhere.

milesbenson commented 1 year ago

It's a simple story:

ARCHIVE/B/B.MOVIE/bmoviefiles.rar ARCHIVE/D/D.MOVIE/dmoviefiles.rar

When i add a bunch of movies into the lettered subfolders, it can happen the cache gets dropped or needs some love with ls -R