Handling potential lease break deadlocks

chucklever commented 8 months ago

Ben Coddington reports:

I ran into this old problem again recently, last discussed here: https://lore.kernel.org/linux-nfs/F738DB31-B527-4048-94A7-DBF00533F8C1@redhat.com/#t

Problem is that clients can easily issue enough WRITEs that end up in

__break_lease xfs_break_leased_layouts ... nfsd_vfs_write ... svc_process svc_recv nfsd

.. so that all the knfsds are there, and nothing can process the LAYOUTRETURN.

I'm pretty sure we can make the linux client a bit smarter about it (I saw one LAYOUTGET and one conflicting WRITE in the same TCP segment, in that order).

But what can knfsd do to defend itself?

Actually, I can't imagine any block/SCSI/NVMe server implementation that would be fine with a client writing to the device at the same time the server does, and so why shouldn't the client preemptively return the layout?

chucklever commented 8 months ago

Jeff Layton says:

That may be, but I don't think we can rely on that from the server-side. I know that Neil and Chuck were discussing spinning up more nfsd threads dynamically as required. That may be the best fix for this sort of problem.

Until then, the guidance should probably be to start a lot of threads if you're running a pNFS SCSI enabled nfsd server.

chucklever commented 8 months ago

But what can knfsd do to defend itself?

If nfsd threads are waiting indefinitely, that's a potential DoS vector. Ideally the thread should preserve the waiting request somehow (or return NFS4ERR_DELAY, maybe?). At some later point when the lease conflict is resolved, the requests can be reprocessed.

That's my naive 800,000 ft view.

chucklever commented 8 months ago

Jeff Layton says: One thing that might help would be make the nfsd threadpool more dynamic. If it could just spin up another thread, we'd be OK.

Maybe the server could keep an emergency thread around that is just for processing LAYOUTRETURN/DELEGRETURN (maybe also CLOSE, etc.) calls? Sometimes these ops are mixed into compounds with other sorts of calls though, so we'd need to deal with that somehow.

If nfsd threads are waiting indefinitely, that's a potential DoS vector. Ideally the thread should preserve the waiting request somehow (or return NFS4ERR_DELAY, maybe?). At some later point when the lease conflict is resolved, the requests can be reprocessed.

That's my naive 800,000 ft view.

Yeah, that would probably be ideal. xfs_break_leased_layouts is pretty heavy handed right now. It currently does this:

       while ((error = break_layout(inode, false)) == -EWOULDBLOCK) {
               xfs_iunlock(ip, *iolock);                              
               *did_unlock = true;                                    
               error = break_layout(inode, true);                     
               *iolock &= ~XFS_IOLOCK_SHARED;                         
               *iolock |= XFS_IOLOCK_EXCL;                            
               xfs_ilock(ip, *iolock);                                
       }

So you can always get stuck in that inner break_layout call. We could try to use IOCB_NOWAIT for I/Os coming from nfsd, which would prevent that, but that seems like it could change the I/O behavior in other ways we don't want. It's not clear to me how that would work alongside IOCB_SYNC anyway.

It's not immediately clear to me how we'd do this with the existing IOCB_* flags, so we might need a new one (IOCB_NFSD?). Then, we could just make sure that break_layout call is always nonblocking if that flag is set.

chucklever commented 8 months ago

Christoph Hellwig says:

On Thu, Nov 16, 2023 at 03:32:34PM -0500, Jeff Layton wrote: One thing that might help would be make the nfsd threadpool more dynamic. If it could just spin up another thread, we'd be OK.

Maybe the server could keep an emergency thread around that is just for processing LAYOUTRETURN/DELEGRETURN (maybe also CLOSE, etc.) calls? Sometimes these ops are mixed into compounds with other sorts of calls though, so we'd need to deal with that somehow.

I think having an extra thread just for LAYOUTRETURN/DELEGRETURN is fundamentally the right thing to do, as they need to be processed to allow everyone else to progress.

If nfsd threads are waiting indefinitely, that's a potential DoS vector. Ideally the thread should preserve the waiting request somehow (or return NFS4ERR_DELAY, maybe?). At some later point when the lease conflict is resolved, the requests can be reprocessed.

That's my naive 800,000 ft view.

That is probably a good idea on top of the above.

So you can always get stuck in that inner break_layout call. We could try to use IOCB_NOWAIT for I/Os coming from nfsd, which would prevent that, but that seems like it could change the I/O behavior in other ways we don't want. It's not clear to me how that would work alongside IOCB_SYNC anyway.

So I definitively think using IOCB_NOWAIT from nfsd and not block the thread for trivial I/O is a good thing. This might even enable not offloading I/O to threads until the IOCB_NOWAIT failed. But any IOCB_NOWAIT that returned -EAGAIN needs to eventually fall back to doing blocking I/O as there is no progress guarantee for IOCB_NOWAIT, and some I/O simply is entirely impossible with IOCB_NOWAIT.

It's not immediately clear to me how we'd do this with the existing IOCB_* flags, so we might need a new one (IOCB_NFSD?). Then, we could just make sure that break_layout call is always nonblocking if that flag is set.

I'd name it about the behavior that it controls and not the callers, but otherwise this too seems like a good idea.

chucklever commented 8 months ago

Another thought from Jeff: When we get down to one unblocked nfsd (or past a high water mark), only allow compound operations that release state. Everything else gets NFS4ERR_DELAY.

chucklever commented 2 months ago

This is a duplicate of #71, which is more specific, so close this one.

linux-nfs / nfsd

Handling potential lease break deadlocks #63