leil-io / saunafs

SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
https://saunafs.com
GNU General Public License v3.0
59 stars 4 forks source link

Samba FUSE Performance Issues #157

Open anon314159 opened 3 months ago

anon314159 commented 3 months ago

Reference: https://github.com/moosefs/moosefs/issues/573

This issue is reproducible in LizardFS, MooseFS and the latest version of SaunaFS built from source. I have built several testbed clusters and all of them exhibit the same read performance issues when re-exporting FUSE mounts via SAMBA.

uristdwarf commented 2 months ago

I'll look into this over the weekend, but be warned that we currently don't intend to support Samba and it's not planned. However, I'm looking into this because the issue seems to be related with the FUSE client at first glance, not Samba specifically. @aNeutrino also mentioned in the past that LizardFS worked fine with Samba a few years ago.

In the meantime, can you share how you configured the re-exporting of the FUSE mounts via Samba? I'm not terribly familiar with Samba.

anon314159 commented 2 months ago

Hello @uristdwarf,

Thanks for the quick turnaround time and response.

Below is a very rough example of the setup: image

Essentially, it's an 8-node cluster consisting of six chunk servers, one master, and one high performance samba gateway. The purpose of the gateway is to act as a fuse bridge, whereas it mounts various " Linux native' file systems (ex. GlusterFS, CephFS. BeeGFS, Lustre, MooseFS etc.) and then reexports these fuse mounts via SMB, thereby providing interoperability for various Windows clients (ideally our organization will one day realize how awful this platform is, but I digress). Ideally any file system that supports locks, extended attributes (pesky NT ACL's), and is posix compliant should support this configuration and I have tested a variety of distributed systems with pretty decent results. The scope of my problems with MooseFS/LizardFS/SaunaFS seems to be tied to very poor read performance with any uncached files accessed via this configuration (i.e. fuse-mount -> smbd -> Linux/Windows clients). Other fuse based distributed file systems do not exhibit this behavior. Oddly enough, if I prime the file system cache on the smb_proxy server by stating or dd'ing test files and then read the very same data from a smb client the problem goes away entirely so long as that data resides in the server's cache. Regarding the fuse mounts, I am not using any special flags or options just the standard sfsmount command targeting the master server's export(s) and designated local mountpoint. While I am aware this adds a layer of indirection, all of my testbed equipment resides on the same local network switches and are connected via 40GbE network adapters for each node.

uristdwarf commented 2 months ago

I had checked last weekend, and it seems to point to the CRC calculations from reads as the main culprit in terms of CPU usage. I'm not sure yet why (@aNeutrino had floated the idea it could be because it's recalculating the CRC every time for chunks in the cache, but I need to investigate this), but before I go further, I want to make sure that it is related to the issue you have.

image

If you have the time and are still running SaunaFS on the test cluster, could you please try this patch out? This isn't a solution and I don't recommend running it other than for testing, but it would help me confirm whether the culprit is CRC or not.

no-crc.txt

(For some arbitrary reason, Github doesn't allow uploading patch files even when it says it does. This is a patch file, not a text file. Apply it with git apply no-crc.txt)

NOTE: If you are using either the package.sh or the create-deb-package.sh script to build, you need to commit the change (probably in another branch). This is because the scripts do a git clone, thus not preserving any modified changes that haven't been commited.

anon314159 commented 2 months ago

I had checked last weekend, and it seems to point to the CRC calculations from reads as the main culprit in terms of CPU usage. I'm not sure yet why (@aNeutrino had floated the idea it could be because it's recalculating the CRC every time for chunks in the cache, but I need to investigate this), but before I go further, I want to make sure that it is related to the issue you have.

image

If you have the time and are still running SaunaFS on the test cluster, could you please try this patch out? This isn't a solution and I don't recommend running it other than for testing, but it would help me confirm whether the culprit is CRC or not.

no-crc.txt

(For some arbitrary reason, Github doesn't allow uploading patch files even when it says it does. This is a patch file, not a text file. Apply it with git apply no-crc.txt)

NOTE: If you are using either the package.sh or the create-deb-package.sh script to build, you need to commit the change (probably in another branch). This is because the scripts do a git clone, thus not preserving any modified changes that haven't been commited.

Awesome, like finding a needle in a haystack. Sure thing, I can deploy your patches, conduct additional benchmarking and evaluate the difference sometime mid next week. Thanks again for looking into this issue.