vitalif / vitastor

Simplified distributed block and file storage with strong consistency, like in Ceph (repository mirror)
https://vitastor.io
Other
128 stars 22 forks source link

[qemu] qemu data error #35

Closed mirrorll closed 2 years ago

mirrorll commented 2 years ago

hi, @vitalif i use windows 10 as guest os and found that when it start, some of the qemu iov array item use the same address at a IO request。for example in a iovcnt = 5 request,iov[0] and iov[3] use the same address。when client slice the reqeust (in two part_op), part_op[0] use iov[0]-iov[2] and send to one osd , part_op[1] use iov[3]-iov[4] and send to another osd. if part_op[1] return earlier than part_op[0], the address of iov[0] and iov[3] will use the data of iov[0], this result is not same as use qcow2 file as vm disk. so, when the request opcode is read, i use another malloc part_op iov buf to save the data read from different osd, and memcpy data to qemu iov after all part_op done. but when i calc crc32 at part_op iov and qemu iov, some time it is different。i mprotect the qemu iov PROT_READ before all part_op done. then mprotect it PROT_READ|PROT_WRITE. but this way will cause qemu error: kvm run failed Bad address.

vitalif commented 2 years ago

Hi, that's something interesting. Are these read or write requests? What's the idea of sharing the same address? I.e. as I understand if it requests to read 2 different parts of data into the same buffer it can't rely on anything? I.e. the real hardware also doesn't guarantee any order as I understand. Or are these write requests? But as I understand if these shared requests were write requests they would be executed correctly?..

mirrorll commented 2 years ago

i just print read request。 maybe it is qemu want save memory and data which use the same buf is invalid。as my test ,read from qcow2 file is always right。and a Windows 10 guest vm, after restart about 20 times it will be crash because it is disk data is wrong, but use qcow2 disk is ok.

vitalif commented 2 years ago

I suspect this bug was actually caused by #36 ?

mirrorll commented 2 years ago

I suspect this bug was actually caused by #36 ?

After fix #36 ,the probability goes down,but it still exist。

vitalif commented 2 years ago

How to reproduce it? Just try to run Win10 guest vm and restart it a lot of times and it'll crash? Does it reproduce with #36 fixed if you try it with a clean Win10 installed/uploaded after the fix?

mirrorll commented 2 years ago

I fix #36 and uploaded a clean Win10, then snapshot it ,and use the snapshot vm restart about 20 times , it reproduce.( win10 crash).

vitalif commented 2 years ago

Do you test it with virtio-win driver? I don't have a prepared working image with windows + virtio-win, and when I try to test it on a snapshotted windows 7 VM with QEMU AHCI emulation it works fine. I rebooted it many times :)

mirrorll commented 2 years ago

with #38 ,then it is no longer appears。