nfs-ganesha / nfs-ganesha

NFS-Ganesha is an NFSv3,v4,v4.1 fileserver that runs in user mode on most UNIX/Linux systems
1.48k stars 513 forks source link

nfs4_op_readdir seems inefficient and possibly buggy in some cases #1034

Closed drieber closed 7 months ago

drieber commented 9 months ago

I am looking at how nfs-ganesha implements NFSv4 READDIR (and v4.1, I haven't looked at other protocols), specifically how ATTR_CHANGE of the dirents is requested, collected and returned.

I am seeing in wireshark and my own debug logging that the client is doing READDIR, requesting a bunch of attributes including ATTR_CHANGE, but my fsal's readdir function gets an attrmask that does NOT have ATTR_CHANGE. However, I see in wireshark the READDIR response has directory entries with all attributes, including ATTR_CHANGE. After debugging some more I found that nfs-ganesha is doing an extra getattrs call for each dirent inside a test_access call within nfs4_readdir_callback.

I am surprised things work this way. It seems wasteful (my fsal's readdir implementation already produces all required attributes). In my use-case I don't think there is any need to do anything special in test_access. In fact, I'm considering turning my FSAL's test_access function into a noop.

So I tried that (turning test_access into a noop), and with that I see in wireshark that now the READDIR response entries do NOT have ATTR_CHANGE.

At this point it is worth noting that nfs4_op_readdir does NOT pass the attrmask from the user's READDIR request to fsal_readdir. Instead it passes ATTRS_NFS3 plus possibly ATTR_ACL and/or ATTR4_SEC_LABEL. See: https://github.com/nfs-ganesha/nfs-ganesha/blob/3113f970511289ae8928078cf392d17c0df81b38/src/Protocols/NFS/nfs4_op_readdir.c#L668 ... This seems wrong.

Some additional notes I collected, including more details about my usecase:

I know the call path goes more or less like this: nfs4_op_readdir -> fsal_readdir -> readdir where that last readdir is really mdcache_readdir. There is a fork in the road at this point: if avl_chunk == 0 then we go through mdcache_readdir_uncached which essentially just calls our FSAL's readdir hook. if avl_chunk != 0 then we would go through mdcache_readdir_chunked which may call mdcache_populate_dir_chunk if there are cache misses. mdcache_populate_dir_chunk calls the sub-fsal's readdir with an attrmark that includes every supported attribute.

In my use-case we configure are export with Dir_Chunks = 0, so in my use-case we go through mdcache_readdir_uncached.

The mdcache_readdir_uncached path passes attrmask unchanged. Conclusion: my FSAL's readdir function gets the same attrmask value as what fsal_readdir received.

In summary:

Should nfs4_op_readdir pass the full attrmark from the request to fsal_readdir? Is it ok to turn test_access into a noop? What are the things to keep in mind when making that decision?

Thanks!

ffilz commented 9 months ago

OK, so first, mdcache_readdir_uncached should ask for all the attributes. Only asking for the requested ones is liable to result in attributes marked as valid in an mdcache entry that are missing some attributes.

Second, nfs4_op_readdir should either ask for all the supported attributes or ask for the ones actually requested, probably the ones actually requested as that will avoid causing extra attribute fetch for things like ACL if not requested and will adapt to any future change in mdcache of attribute validity granularity (currently only ACL and security label are separately managed).

drieber commented 9 months ago

What about the test_access calls? If readdir returns all attrs, why make a test_access call? Is it ok to override test_access and turn it into a noop?

ffilz commented 9 months ago

FSALs can't override test_access as it only ever makes it to mdcache_test_access. It doesn't get passed down to the FSAL.

Also, the test_access call is making a permission check, it's not trying to cause attrs to be fetched.

If we do the right attr requests, where the critical bit is that uncached readdir isn't asking for all the attributes, then the test_access should not result in additional attribute fetch.

Note that we strongly discourage uncached readdir so fixing the bug in it is lower priority for us. If you want to submit a patch that would be great.

Also, it would be good for nfs4_op_readdir to ask for the right attributes as a second patch.

drieber commented 9 months ago

We configure nfs-ganesha like this:

MDCACHE {
  Dir_Chunk = 0;
}
...
EXPORT {
  Attr_Expiration_Time = 0;
  ...
  FSAL {
    name = DEVTOOLS_VFS;
  }
}

DEVTOOLS_VFS is our own fsal. With this configuration, is it still true that test_access will not result in additional attribute fetch?

I would love to experiment with MDCACHE caching enabled, but for that we need to be able to do invalidation synchronously.

ffilz commented 9 months ago

Why would you need to do invalidation synchronously?

A 2nd client fetching attributes will always race with a 1st client that is doing something that changes the attributes.

drieber commented 9 months ago

At Google we have a virtual filesystem for accessing our very large source code repository to developer machines, build machines and so on. We are using NFS Ganesha and our custom FSAL for providing access to this virtual file system locally, over the loopback address, on macOS (because Apple is moving towards deprecating kernel extensions, and hence our fuse-based solution will stop working at some point).

Some things to point out:

When developers or tools do a version control checkout, they expect the state of the filesystem to be synchronously updated by the time the checkout returns.

Google’s internal version control tools are tightly integrated with the virtual filesystem to optimize performance.

ffilz commented 9 months ago

Could you jump on IRC? What time-zone are you in? I think we might benefit from a live conversation. We could also do a Google Meet. Or you could join the community call Tuesday 7:00 AM Pacific Standard Time.

drieber commented 9 months ago

I sent you an invite to discuss this at 1pm.

I also sent https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1173507 which should fix the core issue.

ffilz commented 9 months ago

I was reviewing the mdcache upcalls after some conversation with @dang. Invalidate and update are synchronous. I think invalidate would be safe to call in the context of a thread that had made a FSAL call as long as you don't use FSAL_UP_INVALIDATE_CLOSE that triggers a file close. Update would be troublesome to call from a thread that made a FSAL call because it takes the attr_lock which may be held.

drieber commented 7 months ago

I believe this can be closed now.

ffilz commented 7 months ago

Patch submitted https://github.com/nfs-ganesha/nfs-ganesha/commit/dc63fefb4a5928d5ee05ff24aea491e8fe337fde