kahing / catfs

Cache AnyThing filesystem written in Rust
Apache License 2.0
842 stars 54 forks source link

How to limit number of threads so that catfs can be used in multi-tenant environments? #63

Open HenrikBengtsson opened 2 years ago

HenrikBengtsson commented 2 years ago

From experimenting with the devel version of catfs (commit cbd7ab7), I noticed that it uses multi-threading. Looking at htop, it appears that catfs is using all(?) CPU cores by default. Is that correct?

Also, is there a way, e.g. a command-line option, to limit the number of cores that a specific catfs instance will use?

(I know zero Rust, otherwise I'd try to figure this out myself from the source code.)

HenrikBengtsson commented 2 years ago

From some more ps inspections, it looks like the number of threads is fixed at some large number (107?). Using two catfs mounts as an example, I see 107 threads for each catfs process:

$ for pid in $(pgrep catfs); do echo "$pid:"; pstree "$pid"; done                                                                       
25800:
catfs───107*[{catfs}]
25962:
catfs───107*[{catfs}]

I see the same numbers on both an 8-core machine, and a 32-core machine.

gaul commented 2 years ago

Can you limit CPU usage via cgroups?

https://stackoverflow.com/questions/28814002/using-cgroups-to-limit-cpu-usage

HenrikBengtsson commented 2 years ago

Thanks for the quick reply.

Unfortunately, cgroups is out of reach. It'll require sysadms to step in to configure the hosts, which might not be possible or an option for all catfs users. The beauty of catfs itself is that it requires no privileges.

On some systems, a non-privileged user might be able to limit the CPU usage to, say 150%, using:

systemd-run --user --scope -p CPUWeight=150 catfs ...

However, that is not supported on all systems by default.

Either way, I think using cgroups to limit the behavior of catfs will be a bit like running a Formula 1 car at full throttle all the time and controlling it's speed by breaking it from behind using some heavy tractor. I'd imagine running 100+ threads with limited CPU affinity would be inefficient for catfs itself.

More importantly, software that assumes and behaves as they have exclusive access to all CPU resources on a machine is often frowned upon by sysadms, especially so on multi-tentant systems like HPC environments, because such software tend to clog up machines. It's not unheard of that end-users suffer in the crossfire without even knowing they're using the system in a bad way.

Looking at the code, I found two instances of ThreadPool::new(<nthreads>) and both specify a hard-coded number of threads;

https://github.com/kahing/catfs/blob/cbd7ab72ea1ec7a5b0bd87a009900b1a250c6117/src/catfs/mod.rs#L126

https://github.com/kahing/catfs/blob/cbd7ab72ea1ec7a5b0bd87a009900b1a250c6117/src/pcatfs/mod.rs#L33

I have no understanding what those two thread pools are doing, but if I replace (5,100) with, say, (3,10), recompile, I get:

$ for pid in $(pgrep catfs); do echo "$pid:"; pstree "$pid"; done
20355:
catfs───15*[{catfs}]
32486:
catfs───15*[{catfs}]

so I'm pretty certain those lines of code controls then number of threads used by catfs.

My wish/feature request, would be to make it, at a minimum, possible to control those via a command-line option, e.g. --threads=3,10.