longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
6.14k stars 603 forks source link

[BUG] Longhorn is very, very slow on HDD (e.g. 3x slower even with one local replica) #3896

Open fzyzcjy opened 2 years ago

fzyzcjy commented 2 years ago

Hi thanks for the storage solution! However, it seems very very slow. For example, I run sysbench on a folder that is mounted via hostPath:

root@benchmark-0:/test/pvc# cd /test/host-data-disk    
root@benchmark-0:/test/host-data-disk# sysbench fileio --file-total-size=15G --file-num=1 --file-test-mode=rndrw --time=300 --max-requests=0 prepare ; sysbench fileio --file-total-size=15G --file-num=1 --file-test-mode=rndrw --time=300 --max-requests=0 run
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)

1 files, 15728640Kb each, 15360Mb total
Creating files for the test...
Extra file open flags: (none)
Creating file test_file.0
16106127360 bytes written in 133.00 seconds (115.49 MiB/sec).
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Extra file open flags: (none)
1 files, 15GiB each
15GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

File operations:
    reads/s:                      2307.86
    writes/s:                     1538.57
    fsyncs/s:                     38.47

Throughput:
    read, MiB/s:                  36.06
    written, MiB/s:               24.04

General statistics:
    total time:                          300.0051s
    total number of events:              1165492

Latency (ms):
         min:                                    0.00
         avg:                                    0.26
         max:                                  704.32
         95th percentile:                        0.83
         sum:                               299310.16

Threads fairness:
    events (avg/stddev):           1165492.0000/0.00
    execution time (avg/stddev):   299.3102/0.00

Then, I run the same thing, this time on a PersistentVolume, whose PVC has storage class handled by longhorn:

root@benchmark-0:/test/pvc# sysbench fileio --file-total-size=15G --file-num=1 --file-test-mode=rndrw --time=300 --max-requests=0 prepare
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)

1 files, 15728640Kb each, 15360Mb total
Creating files for the test...
Extra file open flags: (none)
Creating file test_file.0
16106127360 bytes written in 136.71 seconds (112.36 MiB/sec).
root@benchmark-0:/test/pvc# sysbench fileio --file-total-size=15G --file-num=1 --file-test-mode=rndrw --time=300 --max-requests=0 run
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Extra file open flags: (none)
1 files, 15GiB each
15GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

File operations:
    reads/s:                      628.59
    writes/s:                     419.06
    fsyncs/s:                     10.48

Throughput:
    read, MiB/s:                  9.82
    written, MiB/s:               6.55

General statistics:
    total time:                          300.0087s
    total number of events:              317448

Latency (ms):
         min:                                    0.00
         avg:                                    0.94
         max:                                  676.27
         95th percentile:                        1.32
         sum:                               299753.51

Threads fairness:
    events (avg/stddev):           317448.0000/0.00
    execution time (avg/stddev):   299.7535/0.00

As you can see, it is much much slower.

I wonder what am I doing wrong?

derekbit commented 2 years ago

@fzyzcjy Thanks for the benchmarking. The latency in engine to replicas results in the slow performance. We are still working on performance improvement. Currently, Longhorn is suitable for the SSD storage.

fzyzcjy commented 2 years ago

@derekbit Hi thanks for the reply. So may I know when will it be faster? It seems to be the most lightweight storage provisioner and is quite promising - except the speed.

voarsh2 commented 2 years ago

Argh. It drives me insane how slow Longhorn is on HDD's. It makes my £5,000 setup feel like a tiny investment, and then told I need to fork out several hundred for a few SSD's... I feel like x3 1TB 7200 RPM disks in a RAID 0 doesn't help much at all. Simply the way Longhorn is designed by COMPELTETLY rebuilding the data any time there's a problem makes me long for Ceph, why can't it checksum/reuse actual data bits!? But because of Monitor instability and needing 3/5 monitors and the fact that it blew up on me and has only mirroring to external Ceph cluster made me settle for Longhorn, begrudgingly

LD;TR Ceph works fine on HDD's and 1GB networking, no problems, except for the 3 monitor issue making me lose data when they lost quorum.

fzyzcjy commented 2 years ago

@voarsh2 Ah... So how did you solve it?

voarsh2 commented 2 years ago

@voarsh2 Ah... So how did you solve it?

I wish I did...... The only thing that has made me stick with Longhorn is the S3 compatible backups.... and there's no quorum.... otherwise I'd be rooting for Ceph.

I will review SSD's in the future, but it simply isn't in my budget, especially since I have several TB's of data. Even 2-10GB volumes take forever to rebuild. The only solution I have at the moment is to split up your volume data into "sub" volumes and mount volumes within volumes so that the data per volume is smaller which can help with restoring and rebuilding. You'd just need to remove the bad volume if you're completely restoring and the workload can (depends on your app) continue with that data gone.... Databases (especially Maria Galera) is especially painful as Longhorn doesn't support TRIM, so when it drops and recreates table the size just grows and grows. I find myself frequently deleting volumes and recreating them to reclaim space and rebuild times....

fzyzcjy commented 2 years ago

That sounds like a painful experience... :(

gp187 commented 2 years ago

@fzyzcjy Thanks for the benchmarking. The latency in engine to replicas results in the slow performance. We are still working on performance improvement. Currently, Longhorn is suitable for the SSD storage.

No, it's not. It's insanely slow on SSD as well. I just benchmarked. Rsync results.

sent 57,683 bytes  received 337 bytes  1,172.12 bytes/sec
sent 53,839 bytes  received 337 bytes  976.14 bytes/sec

No, it is not the network speed. No, it is not SSD problem

Without Longhorn I get +200MB/sec transfer rates

voarsh2 commented 2 years ago

This pretty much sums up my issue here, getting some strange results when I run tests on host disk and the LVM for Longhorn (which is also on the same physical disk): https://github.com/longhorn/longhorn/discussions/4186