About performance evaluation

cyb70289 commented 3 years ago

Hi @JayjeetAtGithub ,

I'm evaluating rados-parquet performance. I tested on 4 virtual machines. It's not good for benchmarking, but I did find some issues I cannot explain. Would like to hear your comments.

Host server is installed with 64 Xeon(R) Gold 5218 cores and 128G ram. I launched 4 VMs with 8 core and 16G ram each. One VM as client, and another 3 VMs as Ceph OSD+Monitor.

Running bench.py [1], I see rados-parquet performance is much worse than parquet for all selection ratios. But I didn't see any resource bottleneck on the test VMs and host. CPU, memory, disk, network are all of low usage. It's a bit strange why rados-paquet is running slow in this test setup. What's the possible tuning strategy I can try?

Test log rados-parquet

root@cyb-arrow-ceph-client:~/deploy-arrow-ceph# python3 bench.py rpq 1 /mnt/cephfs/dataset/ 8 /tmp/log
13.305527687072754
15.515159845352173
16.06674361228943
13.13004755973816
12.211894512176514
9.538312673568726
8.966675281524658
7.932516098022461

parquet

root@cyb-arrow-ceph-client:~/deploy-arrow-ceph# python3 bench.py pq 1 /mnt/cephfs/dataset/ 8 /tmp/log
5.550771474838257
6.440531253814697
6.137850284576416
5.772816896438599
5.683987140655518
5.299044370651245
5.263920068740845
4.919328689575195

[1] https://github.com/JayjeetAtGithub/skyhook-nsdi/blob/master/bench.py

Yibo

JayjeetAtGithub commented 3 years ago

@cyb70289 Sorry for the late reply. Can you tell a bit more about the dataset you used? Like how many files you had. I am guessing you used the 128 MB file I sent you.

cyb70289 commented 3 years ago

Yes, I'm using the 128MB file. I generated 40 files under cephfs.

cyb70289 commented 3 years ago

FYI, I built binaries from your Arrow PR, not from this repo. Build type is Release.

JayjeetAtGithub commented 3 years ago

@cyb70289 These are some results on my setup with 4 OSDs (bare-metals, 16 logical cores, 64 GB DRAM), 3 MONs (bare-metals, 16 logical cores, 64 GB DRAM), and a single client. The network is 10Gb/s and every OSD is on an NVMe drive. Talking about the Ceph configuration, my cephfs_data pool has 128 PGs with replication turned on to 3 and PG autoscaling turned off.

rados-parquet

root@node0:/users/noobjc# python3 bench.py rpq 1 /mnt/cephfs/dataset 16 result_rpq.json
10.858497619628906
11.692570924758911
11.904498815536499
9.314198017120361
8.072686433792114
5.368509531021118
4.973011493682861
4.362470388412476

parquet

root@node0:/users/noobjc# python3 bench.py pq 1 /mnt/cephfs/dataset 16 result_pq.json
9.049651622772217
11.662115573883057
11.579400539398193
10.566514492034912
10.72178316116333
9.71735692024231
9.325489521026611
9.544288635253906

Can you please check the number of PGs present in your Ceph cluster? I must say, the number of PGs should follow the formula

                (OSDs * 100)
   Total PGs =  ------------
                 pool size

or else the performance is severely hampered (especially if the PGs are less than it should be according to the formula).

JayjeetAtGithub commented 3 years ago

FYI, I built binaries from your Arrow PR, not from this repo. Build type is Release.

That should be fine. no major changes have been introduced. But for sanity, you can still try to use arrow build in release mode from arrow-master.

cyb70289 commented 3 years ago

PG looks good on my cluster (3 osds, 3 replicas, 128 pgs, auto scale is off). Will try 4 osds as your configuration. A quick question, should osd op threads = 16 in /etc/ceph/ceph.conf matches your cpu cores?

JayjeetAtGithub commented 3 years ago

A quick question, should osd op threads = 16 in /etc/ceph/ceph.conf matches your cpu cores?

Yes, that's right.

cyb70289 commented 3 years ago

Looks it's related to network bandwidth. I limited virtual network to lower bandwidth and see better result from rqp than pq for selection <= 25%.

JayjeetAtGithub commented 3 years ago

@cyb70289 That sounds great, thanks a lot for sharing! I was interested, To how much did you limit the network?

cyb70289 commented 3 years ago

@cyb70289 That sounds great, thanks a lot for sharing! I was interested, To how much did you limit the network?

Just try to simulate 10Gb network. I limited client nic to 10Gb, 4 osd nic to 2.5Gb each.

uccross / skyhookdm-arrow

About performance evaluation #180