Open qinzuoyan opened 8 years ago
what is the original number (local rocksdb)?
@imzhenyu , original number provided as above.
_Deployment_
_Machine_ c3-hadoop-tst-pegasus-ssd4-st01.bj (24 core * 2.10GHz, 64GB memory, 4 ssd)
_Build_ $ ./run.sh build -t release
_Test(run in rocksdb/ dir)_ $ ./run.sh start_onebox $ ./dsn.ddlclient config.ini create_app -name test -type rrdb -pc 3 -rc 3 $ ./run.sh bench --app_name test --key_size 16 --value_size 1000 -n 100000
slog_dir = /home/work/pegasus/rocksdb/onebox data_dirs = /home/work/pegasus/rocksdb/onebox
[rrdb set] result
fillseq_rrdb : 333.439 micros/op 2999 ops/sec; 2.9 MB/s
Microseconds per op:
Count: 100000 Average: 333.4382 StdDev: 70.28
Min: 257.0000 Median: 329.1232 Max: 11683.0000
Percentiles: P50: 329.12 P75: 345.92 P99: 425.10 P99.9: 784.00 P99.99: 3250.00
[rrdb get] result
readrandom_rrdb : 121.927 micros/op 8201 ops/sec; 7.9 MB/s (100000 of 100000 found)
Microseconds per op:
Count: 100000 Average: 121.9269 StdDev: 20.91
Min: 85.0000 Median: 116.2685 Max: 1107.0000
Percentiles: P50: 116.27 P75: 132.59 P99: 191.23 P99.9: 243.52 P99.99: 350.00
slog_dir = /home/work/ssd1/pegasus data_dirs = /home/work/ssd1/pegasus
[rrdb set] result
fillseq_rrdb : 328.305 micros/op 3045 ops/sec; 3.0 MB/s
Microseconds per op:
Count: 100000 Average: 328.3046 StdDev: 94.50
Min: 259.0000 Median: 326.1517 Max: 18048.0000
Percentiles: P50: 326.15 P75: 342.50 P99: 398.96 P99.9: 695.45 P99.99: 2666.67
[rrdb get] result
readrandom_rrdb : 120.011 micros/op 8332 ops/sec; 8.1 MB/s (100000 of 100000 found)
Microseconds per op:
Count: 100000 Average: 120.0117 StdDev: 20.70
Min: 84.0000 Median: 114.6198 Max: 1138.0000
Percentiles: P50: 114.62 P75: 129.21 P99: 187.43 P99.9: 241.73 P99.99: 620.00
slog_dir = /home/work/ssd1/pegasus data_dirs = /home/work/ssd2/pegasus,/home/work/ssd3/pegasus,/home/work/ssd4/pegasus
[rrdb set] result
fillseq_rrdb : 327.245 micros/op 3055 ops/sec; 3.0 MB/s
Microseconds per op:
Count: 100000 Average: 327.2444 StdDev: 131.86
Min: 260.0000 Median: 325.3862 Max: 26098.0000
Percentiles: P50: 325.39 P75: 341.53 P99: 398.33 P99.9: 562.86 P99.99: 3000.00
[rrdb get] result
readrandom_rrdb : 123.428 micros/op 8101 ops/sec; 7.9 MB/s (100000 of 100000 found)
Microseconds per op:
Count: 100000 Average: 123.4279 StdDev: 34.85
Min: 81.0000 Median: 116.9138 Max: 8099.0000
Percentiles: P50: 116.91 P75: 134.37 P99: 195.20 P99.9: 248.49 P99.99: 760.00
Device | Set (P99 in us) | Get (P99 in us) |
---|---|---|
same hdd | 425.10 | 191.23 |
same ssd | 398.96 | 187.43 |
diff ssd | 398.33 | 195.20 |
It seems that we do not benefit better performance from SSD. But I think the reason is that the QPS is too small(the IO is not the bottleneck), and the data scale is also too small(the data may be in memory for read). We'd better increase QPS and data scale for another test.
_Deployment_
slog_dir = /home/work/ssd1/pegasus
data_dirs = /home/work/ssd2/pegasus, /home/work/ssd3/pegasus, /home/work/ssd4/pegasus, /home/work/ssd5/pegasus, /home/work/ssd6/pegasus, /home/work/s sd7/pegasus, /home/work/ssd8/pegasus
_Machine_ Basic:
Network:
SSD I/O:
_Build_ $ ./run.sh build -t release
_Result_ [rrdb set]
./run.sh bench --app_name perftest --key_size 18 --value_size 1000 --thread_num 200 -t fillseq_rrdb -n 100000
fillseq_rrdb : 29.120 micros/op 34340 ops/sec; 33.3 MB/s
Microseconds per op:
Count: 20000000 Average: 5807.4305 StdDev: 5330.17
Min: 478.0000 Median: 4871.1415 Max: 558289.0000
Percentiles: P50: 4871.14 P75: 6998.56 P99: 21620.03 P99.9: 75095.64 P99.99: 116688.93
CPU Usage (typical value on primary-replica-server):
$ top
top - 13:03:14 up 12 days, 13:56, 1 user, load average: 16.19, 6.02, 2.26
Tasks: 379 total, 1 running, 378 sleeping, 0 stopped, 0 zombie
%Cpu0 : 30.0 us, 17.9 sy, 0.0 ni, 25.3 id, 0.4 wa, 0.0 hi, 26.5 si, 0.0 st
%Cpu1 : 26.9 us, 20.3 sy, 0.0 ni, 39.9 id, 10.8 wa, 0.0 hi, 2.1 si, 0.0 st
%Cpu2 : 34.5 us, 16.6 sy, 0.0 ni, 48.3 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 28.3 us, 18.9 sy, 0.0 ni, 40.6 id, 10.8 wa, 0.0 hi, 1.4 si, 0.0 st
%Cpu4 : 33.8 us, 16.9 sy, 0.0 ni, 48.6 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 28.7 us, 18.2 sy, 0.0 ni, 40.6 id, 10.8 wa, 0.0 hi, 1.7 si, 0.0 st
%Cpu6 : 34.0 us, 16.2 sy, 0.0 ni, 49.1 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 26.9 us, 18.7 sy, 0.0 ni, 41.7 id, 11.0 wa, 0.0 hi, 1.8 si, 0.0 st
%Cpu8 : 33.8 us, 15.2 sy, 0.0 ni, 49.7 id, 1.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu9 : 29.4 us, 18.3 sy, 0.0 ni, 39.4 id, 11.1 wa, 0.0 hi, 1.7 si, 0.0 st
%Cpu10 : 33.4 us, 15.7 sy, 0.0 ni, 49.5 id, 1.4 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 28.4 us, 18.2 sy, 0.0 ni, 40.0 id, 11.6 wa, 0.0 hi, 1.8 si, 0.0 st
%Cpu12 : 27.6 us, 13.8 sy, 0.0 ni, 57.9 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 25.2 us, 13.1 sy, 0.0 ni, 55.9 id, 5.2 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu14 : 28.5 us, 14.1 sy, 0.0 ni, 57.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 24.2 us, 14.5 sy, 0.0 ni, 55.0 id, 5.9 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu16 : 28.4 us, 13.5 sy, 0.0 ni, 57.8 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 26.9 us, 12.9 sy, 0.0 ni, 54.4 id, 5.1 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu18 : 29.1 us, 13.0 sy, 0.0 ni, 57.5 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 24.7 us, 13.2 sy, 0.0 ni, 56.2 id, 5.2 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu20 : 27.8 us, 13.4 sy, 0.0 ni, 58.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 25.1 us, 13.9 sy, 0.0 ni, 54.6 id, 5.8 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu22 : 28.1 us, 13.6 sy, 0.0 ni, 57.6 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu23 : 26.0 us, 12.1 sy, 0.0 ni, 56.4 id, 4.8 wa, 0.0 hi, 0.7 si, 0.0 st
KiB Mem: 65760540 total, 28245740 used, 37514800 free, 1553888 buffers
KiB Swap: 12582908 total, 0 used, 12582908 free. 20458832 cached Mem
Because set() is run in THREAD_POOL_REPLICATION, we set work_count of THREAD_POOL_REPLICATION to 23, so the cpu usage is in uniform distribution.
Network Usage (typical value on primary-replica-server):
$ sar -n DEV 1 100
Linux 3.10.0-123.el7.x86_64 (c3-hadoop-tst-pegasus-ssd8-st06.bj) 01/30/2016 _x86_64_ (24 CPU)
01:04:37 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
01:04:38 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:04:38 PM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:04:38 PM eth2 59696.00 97563.00 64540.71 113804.75 0.00 0.00 0.00
01:04:38 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:04:38 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Because primary replicas receive request from client, and send two pieces of prepare request to secondaries, so sending bandwidth is about two times to receving bandwidth.
SSD Usage (typical value on primary-replica-server):
$ iostat -x 1 -m
Linux 3.10.0-123.el7.x86_64 (c3-hadoop-tst-pegasus-ssd8-st06.bj) 01/30/2016 _x86_64_ (24 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
30.05 0.00 18.42 3.89 0.00 47.64
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 4923.00 0.00 3725.00 0.00 39.53 21.73 0.22 0.06 0.00 0.06 0.06 21.40
sdi 0.00 4773.00 0.00 3604.00 0.00 38.27 21.75 0.24 0.07 0.00 0.07 0.06 23.20
sdk 0.00 5944.00 0.00 4499.00 0.00 47.88 21.80 0.29 0.06 0.00 0.06 0.06 27.20
sdl 0.00 5906.00 0.00 4471.00 0.00 47.55 21.78 0.27 0.06 0.00 0.06 0.06 26.70
sdm 0.00 5956.00 0.00 4523.00 0.00 48.02 21.74 0.27 0.06 0.00 0.06 0.06 25.20
sdg 0.00 5988.00 0.00 4494.00 0.00 48.05 21.90 0.29 0.06 0.00 0.06 0.06 27.30
sdj 0.00 4711.00 0.00 3562.00 0.00 37.85 21.76 0.23 0.07 0.00 0.07 0.06 22.10
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[rrdb get]
./run.sh bench --app_name perftest --key_size 18 --value_size 1000 --thread_num 200 -t readrandom_rrdb -n 100000
readrandom_rrdb : 6.873 micros/op 145488 ops/sec; 141.2 MB/s (100000 of 100000 found)
Microseconds per op:
Count: 20000000 Average: 1372.2929 StdDev: 569.42
Min: 126.0000 Median: 1307.0105 Max: 58490.0000
Percentiles: P50: 1307.01 P75: 1458.11 P99: 3237.55 P99.9: 6266.89 P99.99: 13596.15
CPU Usage (typical value on primary-replica-server):
$ top
top - 13:21:36 up 12 days, 14:14, 1 user, load average: 5.55, 12.18, 10.54
Tasks: 380 total, 1 running, 379 sleeping, 0 stopped, 0 zombie
%Cpu0 : 28.8 us, 15.6 sy, 0.0 ni, 52.4 id, 0.0 wa, 0.0 hi, 3.1 si, 0.0 st
%Cpu1 : 52.0 us, 7.4 sy, 0.0 ni, 40.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 11.2 us, 12.6 sy, 0.0 ni, 76.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 51.4 us, 8.4 sy, 0.0 ni, 40.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 10.2 us, 10.2 sy, 0.0 ni, 79.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 9.8 us, 9.1 sy, 0.0 ni, 81.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 52.2 us, 7.4 sy, 0.0 ni, 40.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 48.5 us, 6.7 sy, 0.0 ni, 44.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65760540 total, 19556020 used, 46204520 free, 1556540 buffers
KiB Swap: 12582908 total, 0 used, 12582908 free. 11450888 cached Mem
Because set() is run in THREAD_POOL_LOCAL_APP, we set work_count of THREAD_POOL_LOCAL_APP to 4, so we can see 4 CPUs are in high use.
Network Usage (typical value on primary-replica-server):
$ sar -n DEV 1 100
Linux 3.10.0-123.el7.x86_64 (c3-hadoop-tst-pegasus-ssd8-st06.bj) 01/30/2016 _x86_64_ (24 CPU)
01:32:09 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
01:32:10 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:32:10 PM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:32:10 PM eth2 15211.00 116880.00 26055.45 167684.33 0.00 0.00 0.00
01:32:10 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:32:10 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
SSD Usage (typical value on primary-replica-server):
avg-cpu: %user %nice %system %iowait %steal %idle
10.83 0.00 3.33 0.00 0.00 85.84
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 1.00 0.00 2.00 0.00 0.01 12.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
No read of SSD, I think the reason is cache hitting, which also can explain why QPS is so outstanding.
We have adapted rocksdb's db_bench to rrdb_bench (refer to https://github.com/imzhenyu/rocksdb/pull/8), and do some test on our servers.
server: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz * 24, Memory 128G, with Hard Disk network: GbE, with ping 0.03~0.05ms
rocksdb test
[rocksdb set] command
critical config: disbale WAL, disable compression, sync=false [rocksdb set] result
[rocksdb get] command
[rocksdb get] result
rrdb test
table: 1 table, 1 partition, 3 replica deploy: total 3 replica-servers(1 primary and 2 secondaries) deployed on 3 servers client: 2 rrdb_bench client run "fillseq_rrdb" benchmark concurrently, and at the same time 1 rrdb_bench client run "readrandom_rrdb" benchmark key-size: 64 bytes value-size: 1000 bytes server-config: release mode (-O2), tool=nativerun, logging_level=DEBUG
[rrdb set] result
[rrdb get] result