notthebee / infra

IaC for my Linux/Unix machines
Do What The F*ck You Want To Public License
1.46k stars 223 forks source link

Smarter way for uncaching files. #19

Closed elmuz closed 2 years ago

elmuz commented 2 years ago

It seems to me that the uncaching process is quite inefficient. In particular, if I understand it correctly it executes a disk usage analysis and sorting for every file that is moved. While this might be ok in some scenarios, it can be extremely inefficient when you need to move a bunch of extremely small files (e.g. Photoprism previews).

I am proposing an approach where candidates to be moved are computed (and sorted) only once (keeping track of their creation date and size). Target free space is estimated by subtraction, hence not verified. While this might be less accurate, it should be sufficient for most of the cases.

It could definitely be improved. It's a kind of WIP, but it's been running for few weeks on my server and it works fine.

lowrents commented 2 years ago

Hi @elmuz

Since I was spending my last days debugging this ansible playbook (was my very first ansible project, and actually it took me way more hours than I thought :D but hey, finally it works - smooth af), to get this run on my home server, I am very interesting in your way of uncaching. Could you provide some more information on how you achieve your logic?

cheers!

elmuz commented 2 years ago

Hi @lowrents

The script I proposed is only way more verbose but does mostly the same logic of the original script. The idea is more or less the following: you can imagine to have 2 disjoint "disk areas", one fast (usually small, called 'cache') and one slow (usually bigger). All your services however do NOT interact with these directly; on the contrary, they only deal with a convenient disk area which is actually the union of the fast and slow areas. Technically these 3 abstractions are all mergerfs mount points.

[- fast -][--------- slow ---------- ]
[------------- storage --------------]

When your services download/store new files they all points to storage, which internally prioritizes fast mount point to be in fact fast. This last area tends to be full quite soon. Hence, you want something that moves some things on the other slow area, as a background/maintenance task. This is what this "uncaching" script does: on one hand, you want your files to stay in cache to be quickly accessible, on the other you want some free space so that your writing operations are also fast. The bottom line is that you'd better to move those files with lower chance to be accessed to the slow area.

The pseudo code of that is:

if cache-free-space-percentage is lower than target do:
    select the least accessed file
    move that file from _fast_ to _slow_
    compute cache-free-space

# execute that every N hours.

The inefficiency that I spotted in the original script was that sorting all the files in the cache can be quite consuming, and that was recomputed at every file moved. In some situations that was extremely inefficient (e.g. where least accessed files are actually a bunch of thousands of files of few kB each). My approach computes the sorted list of cache files (along with their size) and consumes that list one file after the other until the target of free space is reached. Nothing special, but it seems to work until now.

lowrents commented 2 years ago

Hi @elmuz,

thank you very much for the detailed explanation. I really appreciate you took the time to for this response. I will try go trough your script and try to understand a little more of code base. But until that I really need to get nextcloud in the current setting working, completely drives me crazy :D Since I did not expect any support from notthebee and as we are already "talking" at the moment perhaps you have and idea in case you run nextcloud yourself. It is regarding the permission to write on external drives. In theory {{ mergers_root }}/<my-folder> gets mounted in the container in /data/{{ username }}. I can make this folder as a local external storage available in nexcloud for my own user and also read the data within the folder using the web interface. But only read, not write. I nearly tried all possible combination but the only way I get this working is by chomd 777 the whole .

Do you run a nextcloud server and use an external storage instead of the native var/www/data/<my-user>/files?

I also saw that it is possible to define a PUID and PGID. What's a bit confusing that the Ansible Playbook defines these two values in the env and the Ansible documentation says that there is a user paramter. And I cannot found the PUID nor the PGID as an environment variable in Alpine Image description.

Thanks for any help!

lowrents commented 2 years ago

finally I've found what causes the bug: for fstab, the mergers_root gets defined with mount option uid and gid both with value 1000. which means an ls -la /mnt/storage was something like

drwxrwx---  2 1000 1000        4096 Jun 23 11:12 my-folder-1
-rwxr-xr-x  1 1000 1000       18686 May 29 18:15 my-folder-2

Also from inside the docker containers CLI the mounted external drive in /data/{{ username }} a ls -la shows 1000:1000. The storage_cache however was mounted without the option so I was able the perform a test if a mergerfs/ FUSE in general cases issues with nextcloud without re-mount the drives... and voilà there we have a running nextcloud.

Doesn't this line cause any issues in your setup? especially @notthebee who definitely runs nextcloud since it is his playbook.

I also think that the PUID and PGID doesn't affect nextcloud, what is different behaviour compared to photoprism, where I was able to fix the permission issues this way.