gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
602 stars 201 forks source link

Future ideas for filesystem and multi-process synchronization #584

Open pwmarcz opened 2 years ago

pwmarcz commented 2 years ago

This is a collection of notes about what I've learned working on Gramine's FS code: I'm leaving active Gramine development, so hopefully this will be useful for others.

Goals

Some plausible scenarios that I'm assuming we might have for synchronization:

~Sync engine~

Before (see https://github.com/gramineproject/graphene/issues/2158), I proposed and started to implement a "sync engine", a module based on the idea of synchronizing arbitrary data between processes. My thinking was that we could optimize the uncontested case (i.e. a single process doing most of the work) by keeping track of which process has the latest version at the moment.

I no longer think this is a good idea: the implementation ended up extremely over-engineered, with complicated flow of messages being passed around, even before I got to more advanced features like exchanging non-trivial data, or more complicated wait conditions, or interruptible waits.

I believe that good solutions for Gramine will be:

The idea that I think is worth keeping is relying on the process leader as the "server" that keeps all data.

Remember less data

We used to have a problem that when a (host) file got added or removed by another process, Gramine did not notice that. That was because we kept the files in dentry cache and relied on that data.

The (easy!) solution turned out to be do not rely on cache so much, but update data every time. For instance, each listdir operation actually calls host to list the directory again. If a new file appeared, we fill a dentry; if it disappeared, we clear a dentry.

This might be applicable in other situations as well: when in doubt, load the data from host.

Use Linux sources for inspiration

Actually, the easy solution described above was made possible by introducing inodes (#279). Before, we couldn't just clear a dentry so easily, because it represented a possibly open file.

More generally, I learned a lot by studying real sources of filesystem code in Linux: how dentries and inodes work, what kind of mutexes it uses and in what order, how fcntl locks are implemented, what callbacks it uses for the filesystem (e.g. position-independent read).

(I also looked at older, simpler versions of Linux, and at FreeBSD).

I'm not saying to blindly follow Linux: Gramine solves a different problem, and can implement many things in a simpler way. But it's a good starting point. Things are done in Linux this way for a good reason.

Support append mode on host?

Is writing to a (non-encrypted) host file a common use case? For instance, multiple processes logging to a file, probably opened with O_APPEND.

If so, then I think the best course of action is to implement real append mode in PAL, i.e. allow opening files in append mode. We haven't done it so far, I think because stateless operations (write at offset) are more "pure" and deterministic. However, this is a good place to compromise on that principle: append mode is a much better, simpler solution than any kind of synchronization between processes.

Serve files from process leader?

For shared encrypted files, or shared tmpfs files, I think it's worth investigating a client-server model: the "server", i.e. the host process, would make these files available to other processes over IPC.

I admit I haven't thought that through in detail; it's possible that this is also too complicated to consider. I would probably start by examining "prior art": NFS, FUSE, and the 9P protocol which promises to be simple.

fcntl locks

I implemented fcntl locks (https://github.com/gramineproject/graphene/pull/2481) in this client-server model: the process leader keeps information about the locks, and other processes use IPC for locking and unlocking. I think that might be a good starting point for further work on synchronization, but there are some problems that came up.

dimakuv commented 6 months ago

Update from May 2024

I was looking at the possibility of removing libos_handle::dentry field. Unfortunately, this is still far away from being possible.

There are two Gramine problems that make removing dentry from the handle object (and using inodes instead) complex:

  1. Legacy: Gramine/Graphene was initially designed with dentry and inode objects being fused into one. This is being solved piece by piece, by moving dentry fields into inode fields, and side-stepping handle->dentry in favor of handle->inode.
  2. Design: Gramine is decentralized and mostly doesn't use/rely on host information. This leads to synchronization problems like when P1 updates the size/position in a file and P2 doesn't see these updates. This also leads to not-really-correct implementations of IPC mechanisms like POSIX locks -- locks must be associated with an inode, but since there is no universal inode ID in P1 and P2, we had to fall back to dentries (more specifically, abs paths that are stored in dentries).

The design problem is hard to fix, as Pawel explained in this issue. Also, it will have high performance overhead, if child processes must constantly check for updates on the main process (or vice versa, if the main process broadcasts updates to children).

On the good side, I think the only problematic places in Gramine currently are: