stanford-rc / fuse-migratefs

Filesystem overlay for transparent, distributed migration of active data across separate storage systems.
GNU General Public License v3.0
40 stars 10 forks source link

Enable multi-threading #6

Closed kcgthb closed 5 years ago

kcgthb commented 5 years ago

Right now, there's a single pipeline of operations / requests for migratefs on any given host. So if one of those operations is blocking and takes some time (like copyup;ing a large file), all the subsequent operations are blocked. It means that on hosts with a lot of user concurrency, a single copy up could make the whole filesystem unresponsive for everybody else.

One way to mitigate this would be to make copyup operations faster (with sendfile() for instance, as described in #4), but a more definitive resolution would be to make migratefs multi-threaded to give it the ability to still process incoming, interactive requests while expensive operations are running.

This requires careful locking of critical shared structures, but there's already some support for multi threading in fuse.

If all goes well, one could even envision specialized threads, that would be dedicated to specific tasks: service threads could listen for incoming requests and, in order to always stay available, they would immediately dispatch those requests to other threads to would run either a specific kind of operation (copyup) or be running tasks for a specific user.

thiell commented 5 years ago

As of version 0.16, umask is obtained via fuse_ctx and then we set the process (migratefs) umask in FUSE_ENTER(). This is likely to break in MT mode.

But we have the option to run at FUSE_CAP_DONT_MASK (currently we do) and umask can be set at openat/mknodat/mkdirat so hopefully that will still work.

thiell commented 5 years ago

umask() is not used anymore in MT mode, we apply the mask at open/create. The only drawback is that umask has now precedence over Default POSIX ACLs. This is usually the case with all FUSE implementation, even fstest does mention it. As there is no thread-safe umask() call, I don't know how to do it otherwise.

thiell commented 5 years ago

The master branch has now support for multi-threading. The old single-threaded version is available under legacy-singlethreaded but won't be updated anymore. It's way too slow and not usable in production anyway.