Open anon314159 opened 3 months ago
I'll look into this over the weekend, but be warned that we currently don't intend to support Samba and it's not planned. However, I'm looking into this because the issue seems to be related with the FUSE client at first glance, not Samba specifically. @aNeutrino also mentioned in the past that LizardFS worked fine with Samba a few years ago.
In the meantime, can you share how you configured the re-exporting of the FUSE mounts via Samba? I'm not terribly familiar with Samba.
Hello @uristdwarf,
Thanks for the quick turnaround time and response.
Below is a very rough example of the setup:
Essentially, it's an 8-node cluster consisting of six chunk servers, one master, and one high performance samba gateway. The purpose of the gateway is to act as a fuse bridge, whereas it mounts various " Linux native' file systems (ex. GlusterFS, CephFS. BeeGFS, Lustre, MooseFS etc.) and then reexports these fuse mounts via SMB, thereby providing interoperability for various Windows clients (ideally our organization will one day realize how awful this platform is, but I digress). Ideally any file system that supports locks, extended attributes (pesky NT ACL's), and is posix compliant should support this configuration and I have tested a variety of distributed systems with pretty decent results. The scope of my problems with MooseFS/LizardFS/SaunaFS seems to be tied to very poor read performance with any uncached files accessed via this configuration (i.e. fuse-mount -> smbd -> Linux/Windows clients). Other fuse based distributed file systems do not exhibit this behavior. Oddly enough, if I prime the file system cache on the smb_proxy server by stating or dd'ing test files and then read the very same data from a smb client the problem goes away entirely so long as that data resides in the server's cache. Regarding the fuse mounts, I am not using any special flags or options just the standard sfsmount command targeting the master server's export(s) and designated local mountpoint. While I am aware this adds a layer of indirection, all of my testbed equipment resides on the same local network switches and are connected via 40GbE network adapters for each node.
I had checked last weekend, and it seems to point to the CRC calculations from reads as the main culprit in terms of CPU usage. I'm not sure yet why (@aNeutrino had floated the idea it could be because it's recalculating the CRC every time for chunks in the cache, but I need to investigate this), but before I go further, I want to make sure that it is related to the issue you have.
If you have the time and are still running SaunaFS on the test cluster, could you please try this patch out? This isn't a solution and I don't recommend running it other than for testing, but it would help me confirm whether the culprit is CRC or not.
(For some arbitrary reason, Github doesn't allow uploading patch files even when it says it does. This is a patch file, not a text file. Apply it with git apply no-crc.txt
)
NOTE: If you are using either the package.sh or the create-deb-package.sh script to build, you need to commit the change (probably in another branch). This is because the scripts do a git clone
, thus not preserving any modified changes that haven't been commited.
I had checked last weekend, and it seems to point to the CRC calculations from reads as the main culprit in terms of CPU usage. I'm not sure yet why (@aNeutrino had floated the idea it could be because it's recalculating the CRC every time for chunks in the cache, but I need to investigate this), but before I go further, I want to make sure that it is related to the issue you have.
If you have the time and are still running SaunaFS on the test cluster, could you please try this patch out? This isn't a solution and I don't recommend running it other than for testing, but it would help me confirm whether the culprit is CRC or not.
(For some arbitrary reason, Github doesn't allow uploading patch files even when it says it does. This is a patch file, not a text file. Apply it with
git apply no-crc.txt
)NOTE: If you are using either the package.sh or the create-deb-package.sh script to build, you need to commit the change (probably in another branch). This is because the scripts do a
git clone
, thus not preserving any modified changes that haven't been commited.
Awesome, like finding a needle in a haystack. Sure thing, I can deploy your patches, conduct additional benchmarking and evaluate the difference sometime mid next week. Thanks again for looking into this issue.
Reference: https://github.com/moosefs/moosefs/issues/573
This issue is reproducible in LizardFS, MooseFS and the latest version of SaunaFS built from source. I have built several testbed clusters and all of them exhibit the same read performance issues when re-exporting FUSE mounts via SAMBA.