rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

sparse_dd is stuck running at 1MB/s #95

Closed smlawrence closed 5 years ago

smlawrence commented 5 years ago

Plugin version 2.0 (similar issue was observed with v3-nbd so rolled back to 2),

mode: kernel

XCPNG and Ceph are colocated on 3 nodes. Each node has 3x6TB OSD => 3x3x6TB

ceph pool benchmarking shows 180 MB/s

vm-copy from local storage to pool attached with v2 plugin is maxing out around 1MB/s - similar problem was observed with nbd.

However v2 has verbose logging in SMLog showing is it sparse_dd that is running so slow.

sparse_dd bounces between 0% and 10% CPU and all the cores are mostly idling with 12 GB (of 16 GB) free RAM throughout.

How can we check first of all if this is read or write limited and is there any possible known cause at this point? (With the v3 plugin it was impossible to launch migrated or fresh VMs off the pool - haven't gotten to that point with the v2 plugin yet)

smlawrence commented 5 years ago

Launching a VM and installing ubuntu is successful with v2 plugin in kernel mode, but the installer maxed out at 3MB/s. I'm unsure how to identify the bottleneck.

smlawrence commented 5 years ago

Switched to an SSD only SR (earlier comments are for a platter only SR) and sparse_dd's run has settled off around 5-6MB/s - this is a sort of similar ratio to the mounted block img benchmark ratio between the two pools - except 80x slower.

northbear commented 5 years ago

yeah... I have the same performance issue for ceph [luminous|mimic]. The performance is sensibly degraded after moving from firestore to bluestore, plus VMs blkdev IO subsystem also adds to performance lost.

smlawrence commented 5 years ago

I was trying to run both XCPNG and ceph in dom0 - a fresh build of XCPNG 7.6 and running ceph is VMs shows much better performance.