lizardfs / lizardfs

LizardFS is an Open Source Distributed File System licensed under GPLv3.
http://lizardfs.com
GNU General Public License v3.0
955 stars 187 forks source link

Parallel reads does not perform well, parallel write hangs #293

Closed fredrikwidlund closed 9 years ago

fredrikwidlund commented 9 years ago

CentOS 7/LizardFS 2.6.0 E5-2609/32GB RAM LSI 9271-8i/32x4TB NL SAS mfsmaster/mfschunkserver on the same node for testing only default configuration with the devices using xfs mapped in mfshdd.cfg

Just a couple of parallel seq writes will hang LizardFS.

dd if=/dev/zero of=/mnt/a bs=1048576 count=10000 & dd if=/dev/zero of=/mnt/b bs=1048576 count=10000 & dd if=/dev/zero of=/mnt/c bs=1048576 count=10000 & dd if=/dev/zero of=/mnt/d bs=1048576 count=10000 &

alexcrow commented 9 years ago

Hi,

I can't reproduce this this on my 4-chunkserver setup (master is a separate VM on another host).

Cheers

Alex

blink69 commented 9 years ago

@fredrikwidlund did you have any errors from client / chunkserver or master in your syslog ?

fredrikwidlund commented 9 years ago

Tested latest git version now with master on separate host.

No errors on master/chunkserver. Client repeatedly reports problems connecting to chunkserver.

mfsmount[12171]: can't connect to (****:9422): EADDRNOTAVAIL (Cannot assign requested address) (i.e. client to chunkmaster, running on the same host)

Running both master and chunkserver in foreground mode doesn't show anything suspicious.

Starts off writing ok, then gradually slows down to a trickle that "never" completes. Writes are then done in bursts with long pauses inbetween. Recovers now, however, if you stop the writes, and doesn't hang.

fredrikwidlund commented 9 years ago

Testing the same setup in MooseFS 3.0.34 without problems, where the chunkserver logs this additional information.

mfschunkserver[1841]: workers: 10+ mfschunkserver[1841]: workers: 20+

psarna commented 9 years ago

Have you tried to analyze network traffic? EADDRNOTAVAIL happens when you cannot bind to an address (e.g. system is out of free port numbers), which is rather unusual.

mfsmount can be run in foreground as well, maybe it would provide some more debug information.

Also, I assume that you run your installation on RAID, which would probably need some configuration adjustments to be optimal. Patches that optimize this use case (RAID) are under development. Can you try to test this issue on a regular disk? Maybe it is RAID + bad config that generates the bottleneck.

fredrikwidlund commented 9 years ago

Client and chunkserver is running on the same server, and normally have no problems with connectivity. For example writing files sequentially doesn't result in any issues. The warning start appearing when doing multiple operations in parallel.

Underlying filesystem is 32 drives running XFS in a JBOD config.

fredrikwidlund commented 9 years ago

The main part of this issue is not read performance, but that parallel writes breaks.

psarna commented 9 years ago

Can you verify if the problem still occurs when your client is not on the same machine as the chunkserver? It would help to find the right direction of debugging. You could also check if changing values of /sys/fs/fuse/connections/XXX/max_background helps, default max value of fuse threads is 12.

fredrikwidlund commented 9 years ago

I'm afraid the amount of time I have to look further into this is limited right now. If you have tried and can't reproduce it feel free to close the issue on my behalf.

psarna commented 9 years ago

I'm closing the issue since I was unable to reproduce it at all, if you find more time to deal with it, please reopen.

fredrikwidlund commented 9 years ago

Just curious. Did you try a clean CentOS 7 installation, default LizardFS on same host, and the "dd" write commands above?

psarna commented 9 years ago

Yes I've tried fresh 2.6.0 on clean Centos7, didn't hang. I haven't used underlying filesystem with 32 drives though.

If you don't mind, we can return to this topic after patches that allow flexible chunkserver configuration get published and merged (it will happen in the near future). I think that tuning the parameters of network and hdd usage could solve your problems.