Closed olivierlambert closed 6 years ago
Result on the Windows 2012 R2 VMs, using the Samsung 850 EVO:
ext
file based SRSMAPIv3
filebasedThe issue here was nothing to do with how the SR was managed but purely a bug in qemu, since fixed.
Great! Can you point me where I can find the fix? I'd like to re-run some test with the fix then :+1:
You can't, it will be release in a future version of xenserver
What do you mean? There is no public repo where the fix is? :fearful:
correct, still awaiting review by the upstream qemu maintainers
Is there a public PR against upstream Qemu then?
I'm trying to search on the qemu-devel mailing list, but if you have a clue on the subject or the email that posted the patch, that would be really helpful :+1:
Thanks!
@MarkSymsCtx I'm struggling to find the relevant patches on QEMU-devel. I have some potential matches but it's hard to tell, mainly because I don't know which file was modified or who pushed the patch on the mailing list.
Here the list of Citrix people who posted on this list since April:
I think I've read all those patches/description without spotting anything related to this disk speed problem.
It's probably gone internally as I believe the upstream maintainer for that part is also an employee or it's still undergoing internal review, all I know for sure is it's fixed.
Thanks for your answer Mark :+1: Let me try to recap the situation and tell me if it's correct or if I missed something:
qemu
, but it's not publicCorrect
Okay sorry to bother you, but last questions :wink:
This is also joining the major question: why the development code is not available? I mean, for an Open Source project, it's a bit weird and it doesn't really help people to contribute. We'd like to bring you some resources to improve it a bit, but we don't understand the "how".
up @MarkSymsCtx :wink:
So I did new tests, and it's really good (for file level SR):
Also tested nfs-ng
, sadly performance are catastrophic on this side. Is it expected?
Note: on local SSD,
qemu-dp
process hits 100% of one CPU, so I assume there is still room for improvement, which is impressive!
Nfs-ng is not supported or maintained and should not have been shipped, it was rough prototype code for validating the API operations, surprised it still works at all at any level of performance.
Well, in fact, file based SR on top of an NFS mount shows the exact same perf issue (and no problem with local disk). So I suppose it's somehow related on qemu-dp
works and network latency isn't really good for it. But that's just an assumption based on the numbers I got. Is there anything we could tweak (cache or something like that) that could help on this?
As suggested by @kc284 , I post here for better/more fluid discussion regarding storage stack :+1: (thanks god I prefer GitHub than Jira!)
Since
GFS2
started to useSMAPIv3
andqcow2
file format, we decided to do some performance tests.In order to keep it simple as possible, only the filelevel based SR is tested:
Few minor issues: name-label and name-description aren't pushed correctly to XAPI, the SR is named "SR NAME" with "FILEBASED SR" decription. It's only a small glitch, but at least it's reported Otherwise, I can confirm the disk file is created and is a valid
qcow2
file.I did a benchmark on a Samsung 850 EVO SSD, on the same VM. Before benching, I did a local 'ext' SR on the same disk and still the same VM, so I could compare it.
Here is the results:
SMAPIv3
is 3 times slower than "ext" SR. Also with SMAPIv3, note thattapdisk
process seems to be at 100%.SMAPIv3
is 150 times slower than "ext" SRSMAPIv3
is 95 times slower than "ext" SRIf you want the detailed numbers, let me know.
Did I missed something during the SR creation? Since
GFS2
is basically filelevel +GFS2
FS + cluster on top, I should expect roughly the same result I suppose.