lock() & flock() methods not implemented

julian-hj commented 7 years ago

One of our users pointed out to us that flock applied on fuse-nfs mounts doesn't enforce locking between different clients. After confirming that 1) It's true and 2) kernel NFS mounts do enforce flock locks between clients We took a look at the fuse-nfs implementation and noticed that fuse-nfs just treats lock() and flock() as no-ops.

So, we are wondering if you left the locking implementation out because you didn't get to it, or if you left it out because NFS locking is flaky and inconsistent between servers and could cause stuff to hang or fail or is otherwise a bad idea.

If it is the former, are you open to a PR to include a locking implementation? And if so do you have an opinion about where the synchronous implementation of rpc_nlm4_lock_async() should live? (presumably either in libnfs or in fuse-nfs)

sahlberg commented 7 years ago

On Wed, Apr 19, 2017 at 2:05 PM, Julian Hjortshoj notifications@github.com wrote:

One of our users pointed out to us that flock applied on fuse-nfs mounts doesn't enforce locking between different clients. After confirming that

It's true and

kernel NFS mounts do enforce flock locks between clients We took a look at the fuse-nfs implementation and noticed that fuse-nfs just treats lock() and flock() as no-ops.

So, we are wondering if you left the locking implementation out because you didn't get to it, or if you left it out because NFS locking is flaky and inconsistent between servers and could cause stuff to hang or fail or is otherwise a bad idea.

A little bit of both. NFS and file locking have traditionally been "fragile" so many/most applications used to recommend to not use locking at all over nfs.

That said, I am not against adding basic locking to the fuse module if you want to add that. I am happy to review a patch for this.

Unfortunately, due to the way nfs locking works, nlm and nsm, I think there would have to be a few compromises for locking as some things may be difficult to implement in libnfs and/or fuse.

Some of the compromises I think you might need to do would be to 1, From fuse, only use the non-blocking version on the nlm lock request and have a sleep loop to retry in fuse. I recall (from memory) that blocking NLM locks require that you set up a local nlm server on the client side to listen to when the server does the async grant of the lock by sending a NLM_GRANTED request back to the client. That would involve setting up and rpc server for nlm inside fuse and register with the clients portmapper and would not work as it would interfere with the native lock manager that linux itself is running/depending on :-(

2, In nfs locking, lock recovery is driven by the nsm service. Similar to above, when the server reboots and thus require the clients to recover the locks, the server does a requaer does a rpc back to the client to the nsm (rpc.statd) notify procedure to tell the client about the need for lock recovery. Again, to do this correctly in fuse would require that you run your own nsm service and register it with portmapper. Again it would interfere with the systems own nms service adn thus is not viable.

The main consequence of 2 is that I think you will not be able to do lock recovery across server restarts. :-( As server restart is reasonably rare,a nd applications using locking are even more so, maybe you can live with the fact that locking will break across every server restart.

3, Lock recovery also depends on the client to send nsm notify rpc back to the server after a client restart/reboot. That will likely also be difficult to do from fuse as who will be sending these requests if the user just shuts down fuse and never restarts it? Again, not the end of the world as this will just result in permanent orphaned locks on the server. Something that is quite common and happens normally in most NFS systems anyway.

You can probably live with these compromises. Just be aware that it will not be "perfect" but then again nothing really is when it comes to nfs and locking :-(

If it is the former, are you open to a PR to include a locking implementation? And if so do you have an opinion about where the synchronous implementation of rpc_nlm4_lock_async() should live? (presumably either in libnfs or in fuse-nfs)

I am open to a PR. Maybe it should be a nice wrapper in libnfs.c : nfs_flock_async() and a sync varient in sync.c ?

—

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sahlberg/fuse-nfs/issues/20, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeNkGMXY5LEle4la5uXOuYddaTVnbvGks5rxnb-gaJpZM4NCPbs .

julian-hj commented 7 years ago

That all makes sense. Thanks very much for the quick and thoughtful feedback. We will have a think about how we want to proceed.

sahlberg commented 7 years ago

Thinking about it even more, I think you can solve item 3 in my list above without too much effort.

To track the state of the peer, the NSM protocol (rpc.statd) is used to send notifications to the peer to indicate when the service is stopped/started. This is all signalled by a monotonically increasing integer describing the state.

Sending an even state number means "client is starting up" and an odd state means "client is shutting down". In both cases the peer should discard any and all locks that were registered to this client.

If you go this path, I suggest you add local state on the client to track the state number. Store it persistently in the filesystem somewhere or any other persistent store. Everytime you start the fuse module you bump it to the next higher odd number and send a nsm notify. Everytime the fuse module is unloaded normally you do the same but send an even number for state.

That should make it reasonably robust against leaking orphaned locks on the server.

=== NLM / NSM docs. Below is pretty much any/all public documentation that exists for these protocols. It explains why locking is so fragile in nfsv3.

Both NLM and NSM are very poorly documenting. Pretty much only thing that exist is an interface specification but no real protocol specification exist :-( (This makes it so that folks that really do understand how the protocols are supposed to work will never be unemployed :-) )

From /usr/include/rpcsvc/sm_inter.x : /*

state # of status monitor monotonically increases each time
status of the site changes:
an even number (>= 0) indicates the site is down and
an odd number (> 0) indicates the site is up; / struct sm_stat { int state; / state # of status monitor */ };

And also the PDU interface description: http://archive.opengroup.org/publications/archive/CDROM/c702.pdf

On Wed, Apr 19, 2017 at 4:10 PM, Julian Hjortshoj notifications@github.com wrote:

That all makes sense. Thanks very much for the quick and thoughtful feedback. We will have a think about how we want to proceed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sahlberg/fuse-nfs/issues/20#issuecomment-295485939, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeNkNGwJDtc2ZyL8WlQzBRTLM2cZ0BGks5rxpR1gaJpZM4NCPbs .

julian-hj commented 7 years ago

Interesting....so, how does the server know what "this client" means? In our use case, we may have numerous fuse-nfs processes starting and stopping on the same host, potentially mounting the same nfs share with different options.

Also, I noticed in the packet trace that libnfs just sends "libnfs" as the client machine name in the credentials block. Could that result in the nfs server getting confused when different fuse processes go up and down?

Thanks!

sahlberg commented 7 years ago

Client that NSM is using should be the string that you provide in the nlm lock request as NLM4_LOCKargs.lock.caller_name

The credentials block is part of the ONC-RPC header and should not affect (or even be visible) to the NLM layer, but it may be a good idea to change this from "libnfs" to something more descriptive. If for no other reason making it easier to analyze and filter on network traces.

As few things really look at or depend on the cred->host field, having "libnfs" as the default might be ok-ish. Possibly this should be changed to something more descriptive such as libnfs- or something. In your case you probably want to have something more specific to your use case and to easily distinguish between different instances of the fuse module. You can actually override and replace cred->host with whatever you want using the nfs_set_auth and libnfs_authunix_create calls. Maybe fuse-nfs.c should use these functions and set the credentials explicitely to "fuse-nfs-" or something ? Even better might be to be able to set the hostname string from the command line via the nfs url. The NFS url already allow you to set uid and gid. Maybe just add an argument to also set the hostname.

sahlberg commented 7 years ago

FYI. NFSv4 support in libnfs is coming along and should be reasonably complete and to the state where one can use it for for example fuse-nfs. This is likely going to be ready towards the end of the year.

At that stage, when we can run libnfs in a NFSv4 mode, it should be fairly straightforward to do the plumbing to add locking as we no longer need to rely on NLM or NSM.

julian-hj commented 7 years ago

That's excellent to hear! NFSv4 and file locking have both been on our roadmap for a while. If we can add support for v4, then I think support for locking in v3 might be quite a bit less critical.

sahlberg commented 6 years ago

Libnfs has support for both lockf() and fcntl() locking now but only for nfs v4.

With this, it should be fairly straightforward to add locking support to fuse-nfs.

sahlberg / fuse-nfs

lock() & flock() methods not implemented #20