openvstorage / alba

Open vStorage ALBA (alternate backend) creates a replicated or flexible network raid’ed object storage backend out of Seagate Kinetic drives and local disk supporting compression, encryption.
Other
28 stars 10 forks source link

huge drop in read performance after playing around with asd-set-slowness #826

Closed jeroenmaelbrancke closed 5 years ago

jeroenmaelbrancke commented 6 years ago

When all the asds of the cache backend are online we could read 4530KiB/s. From the moment we introduce latency (asd-set-slowness) to the cache asds the read performance drops to 187KiB/s.

I'm not sure if i configured something wrong but this is not expected i suppose.

The environment is still online and the proxies are on debug level. ips: 10.100.189.31-33

packages:

ii  alba-ee                              1.5.17                                     amd64        the ALternative BAckend
ii  arakoon                              1.9.22                                     amd64        Simple consistent distributed key/value store
ii  libovsvolumedriver-ee                6.15.2-0                                   amd64        VolumeDriver client library
ii  volumedriver-ee-base                 6.16.6-0                                   amd64        VolumeDriver common infrastructure
ii  volumedriver-ee-server               6.16.6-0                                   amd64        VolumeDriver daemon and helpers

setup: 1 vpool 1 cachebackend 1 hddbackend

cachebackend: 1,0,1,1 fragment size: 16 MiB No compression No encryption

hddbackend: 1,2,2,2 fragment_size: 2MiB no compression aes-ctr-256

vpool: 2 proxies write buffer: 1GiB SCO size: 4 MiB Cluster size: 4 KiB volume write buffer: 512 MiB DTL: sync Transport: TCP Fragment cache: read/write quota: 10GiB No block cache

added following to the volumedriver config: backend_interface_partial_read_retry_interval_msecs: 10 backend_interface_partial_read_timeout_msecs: 250

fio test:

[global]
thread
ioengine=openvstorage
direct=1
bs=4k
size=2G
rw=randread
#rwmixread=75
group_reporting=1
time_based=1
randrepeat=1
username=edgeuser01
password=wo8baiKohcuaK0Ma

[test]
iodepth=1
volumename=test_jeroen
protocol=tcp
hostname=10.100.189.32
port=26203

without slowness:

fio: time_based requires a runtime/timeout setting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=openvstorage, iodepth=1
fio-2.19-61-gc789-dirty
Starting 1 thread
Jobs: 1 (f=1): [r(1)][99.8%][r=5056KiB/s,w=0KiB/s][r=1264,w=0 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=127841: Mon Nov 27 14:50:40 2017
   read: IOPS=1132, BW=4530KiB/s (4639kB/s)(2048MiB/462906msec)
    slat (usec): min=2, max=124, avg= 9.86, stdev= 5.69
    clat (usec): min=260, max=16846, avg=870.22, stdev=485.57
     lat (usec): min=266, max=16851, avg=880.08, stdev=485.91
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  446], 10.00th=[  462], 20.00th=[  466],
     | 30.00th=[  466], 40.00th=[  482], 50.00th=[  620], 60.00th=[  932],
     | 70.00th=[ 1240], 80.00th=[ 1384], 90.00th=[ 1544], 95.00th=[ 1704],
     | 99.00th=[ 2008], 99.50th=[ 2064], 99.90th=[ 2864], 99.95th=[ 4048],
     | 99.99th=[ 6944]
    lat (usec) : 500=41.05%, 750=14.67%, 1000=5.30%
    lat (msec) : 2=37.81%, 4=1.13%, 10=0.05%, 20=0.01%
  cpu          : usr=1.99%, sys=2.17%, ctx=2934701, majf=0, minf=35
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=524288,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=4530KiB/s (4639kB/s), 4530KiB/s-4530KiB/s (4639kB/s-4639kB/s), io=2048MiB (2147MB), run=462906-462906msec

with slowness set to 0.5,0.5

test: (groupid=0, jobs=1): err= 0: pid=129924: Mon Nov 27 15:29:34 2017
   read: IOPS=46, BW=187KiB/s (191kB/s)(383MiB/2094742msec)
    slat (usec): min=3, max=93, avg=12.61, stdev= 7.41
    clat (usec): min=220, max=1984.6k, avg=21375.62, stdev=112392.48
     lat (usec): min=268, max=1984.7k, avg=21388.24, stdev=112392.71
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  310], 10.00th=[  330], 20.00th=[  466],
     | 30.00th=[  466], 40.00th=[  474], 50.00th=[  620], 60.00th=[ 3088],
     | 70.00th=[ 8896], 80.00th=[12480], 90.00th=[16064], 95.00th=[18304],
     | 99.00th=[774144], 99.50th=[897024], 99.90th=[987136], 99.95th=[995328],
     | 99.99th=[1036288]
    lat (usec) : 250=0.01%, 500=47.99%, 750=5.07%, 1000=0.47%
    lat (msec) : 2=0.24%, 4=8.67%, 10=9.75%, 20=25.13%, 50=0.49%
    lat (msec) : 100=0.03%, 250=0.01%, 750=1.07%, 1000=1.05%, 2000=0.04%
  cpu          : usr=0.54%, sys=1.88%, ctx=13533366, majf=0, minf=35
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=97921,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=187KiB/s (191kB/s), 187KiB/s-187KiB/s (191kB/s-191kB/s), io=383MiB (401MB), run=2094742-2094742msec
domsj commented 6 years ago

I think this is exactly the expected performance:

fragments on the hdds are 2MB this means that optimistically we can fetch 50 fragments/s => 20ms latency

according to fio the avg latency=21388.24 us, which is only slightly more than 20ms

fwiw, in this case performance would be better if no fragment cache were configured ... then the proxy would do partial reads from the hdds too, while now it will always fetch full fragments.

wimpers commented 6 years ago

@jeroenmaelbrancke isn't the partial read from HDD exactly the path you wanted to test by setting the asd-slowlyness?