Closed aesteve-rh closed 1 year ago
/ost
Currently I am trying to verify the fix with a stress test. But I post the PR already to refine it early, so that we can integrate as soon as I manage to reproduce the issue (if so).
/ost
/ost
/ost
I tried to reproduce the issue, both manually and with ovirt-stress, but I couldn't get to see the "No space left on device" in the libvirtd logs on a full setup with two hosts and a running VM. Unfortunately, I can not spend more time with it.
So I would finish the review of this PR, and integrate it. Hopefully on qe they have better luck reproducing.
I tried to reproduce the issue, both manually and with ovirt-stress, but I couldn't get to see the "No space left on device" in the libvirtd logs on a full setup with two hosts and a running VM.
How did you try to reproduce?
When we start live merge we measure the top volume to get the required size of base. This measure the data and bitmaps in base. https://github.com/oVirt/vdsm/blob/6f67a73dac7c02009c0125a59e002590f9cc2beb/lib/vdsm/virt/livemerge.py#L405
If base size is smaller than the measured data, we don't extend base. https://github.com/oVirt/vdsm/blob/6f67a73dac7c02009c0125a59e002590f9cc2beb/lib/vdsm/virt/livemerge.py#L457
So we need to find how much data we must have in top so the measured size is exactly base size. Then add eough stale bitmaps to base (we needed 6 bitmaps in the tests to reproduce this).
You can do the measurements with the reproducer script, using 2.5g images (instead of 128m). When you find the right size, do:
dd if=/dev/urandom bs=1M count=xxx of=/dev/vdb oflag=direct conv=fsync
writing zeros will be faster but when committing the data qemu detects zeros and convert them to zero clusters, which will prevent reproduction of this issue.
qemu-img bitmap add
to the base volumeI think we need to reproduce this first before we modify the code. If you don't have time, maybe someone from QE has time to reproduce.
It is exactly how I tried.
First I tried with 3GiB disk, but was not fully provisioned. Vdsm's created image was 2.5GiB, but virtual size was 3GiB, so I changed it to 2.5GiB (2560MiB). Didn't get it either, and I even tried with 128MiB as is what we used for cold merges. No luck.
Tried to write directly in the VM (to be fair I did only try with /dev/zero
, so today I retried with /dev/urandom
), or with qemu-io
in the host directly in the top volume:
qemu-io -f qcow2 {top_path} -c "write 0 2000M"
qemu-io -f qcow2 {top_path} -c "write 2000M 558M"
I did try also with more snapshots, removing the first or the second snapshot (these tests helped understand the process better, actually), automatically and manually.
None of these really worked. At least in what my goal was: I didn't get the qemu error in journactl
output in any case.
I will document the steps and hand over to QE, I really can not put much more time into trying. I wanted to move on completely today, but still dedicated the morning to try with the /dev/urandom
proposal, with no luck either.
My feeling with the patch is that it should work, same as we fixed the cold merges: basically runs the same operation as with cold merges, and the test proves that it runs when required. If OST runs pass (it does not degrade), I'm good with it.
It is exactly how I tried.
First I tried with 3GiB disk, but was not fully provisioned. Vdsm's created image was 2.5GiB, but virtual size was 3GiB, so I changed it to 2.5GiB (2560MiB). Didn't get it either, and I even tried with 128MiB as is what we used for cold merges. No luck.
Tried to write directly in the VM (to be fair I did only try with
/dev/zero
,
This cannot work, qemu detects the zeros during the merge and does not write any data, so merge cannot fail.
...
None of these really worked. At least in what my goal was: I didn't get the qemu error in
journactl
output in any case.
Getting the error is not reliable way to detect the issue. The issue is detected by checking the merge logs. When the merge complete after an error, the volume is not removed from the chain, and engine will report the the snapshot is illegal.
When this happens, trying to delete the snapshot again is expected to fail forever.
I have finally been able to consistently reproduce the issue with 2560MiB disks.
A (base) < B (active)
dd if=/dev/urandom bs=1M count=2555 of=/dev/vda oflag=direct conv=fsync
A (base) < B (top) < C (active)
for i in $(seq 70); do
qemu-img bitmap --add /dev/<vg_name>/<lv_name> stale-bitmap-$i
done
Results in: Top image (snap2) measure:
required size: 2679767040
fully allocated size: 2685009920
bitmaps size: 0
Base image (snap1) measure:
required size: 655360
fully allocated size: 2685009920
bitmaps size: 9240576
Vdsm logs:
2022-12-05 18:21:39,245+0100 ERROR (libvirt/events) [virt.vm] (vmId='bb1ff201-a159-4436-a23f-8dad4cced1e9') Block job 89667ef9-0b0f-4b63-8ed9-9a720e0d4845 type COMMIT for drive vdb has failed (vm:5955)
Resulting base measure:
required size: 2675048448
fully allocated size: 2685009920
bitmaps size: 9240576
Engine UI:
If we skip adding the bitmaps the merge succeeds, so it is an error caused by the stale bitmaps.
Vdsm logs.
After START merge
and before measuring the relevant disk, we see the prune_bitmaps logs:
2022-12-07 09:49:47,864+0100 WARN (jsonrpc/2) [storage.bitmaps] Prune stale bitmaps ['stale-bitmap-1', 'stale-bitmap-2', 'stale-bitmap-3', 'stale-bitmap-4', 'stale-bitmap-5', 'stale-bitmap-6', 'stale-bitmap-7', 'stale-bitmap-8', 'stale-bitmap-9', 'stale-bitmap-10', 'stale-bitmap-11', 'stale-bitmap-12', 'stale-bitmap-13', 'stale-bitmap-14', 'stale-bitmap-15', 'stale-bitmap-16', 'stale-bitmap-17', 'stale-bitmap-18', 'stale-bitmap-19', 'stale-bitmap-20', 'stale-bitmap-21', 'stale-bitmap-22', 'stale-bitmap-23', 'stale-bitmap-24', 'stale-bitmap-25', 'stale-bitmap-26', 'stale-bitmap-27', 'stale-bitmap-28', 'stale-bitmap-29', 'stale-bitmap-30', 'stale-bitmap-31', 'stale-bitmap-32', 'stale-bitmap-33', 'stale-bitmap-34', 'stale-bitmap-35', 'stale-bitmap-36', 'stale-bitmap-37', 'stale-bitmap-38', 'stale-bitmap-39', 'stale-bitmap-40', 'stale-bitmap-41', 'stale-bitmap-42', 'stale-bitmap-43', 'stale-bitmap-44', 'stale-bitmap-45', 'stale-bitmap-46', 'stale-bitmap-47', 'stale-bitmap-48', 'stale-bitmap-49', 'stale-bitmap-50', 'stale-bitmap-51', 'stale-bitmap-52', 'stale-bitmap-53', 'stale-bitmap-54', 'stale-bitmap-55', 'stale-bitmap-56', 'stale-bitmap-57', 'stale-bitmap-58', 'stale-bitmap-59', 'stale-bitmap-60', 'stale-bitmap-61', 'stale-bitmap-62', 'stale-bitmap-63', 'stale-bitmap-64', 'stale-bitmap-65', 'stale-bitmap-66', 'stale-bitmap-67', 'stale-bitmap-68', 'stale-bitmap-69', 'stale-bitmap-70'] from '/rhev/data-center/mnt/blockSD/a2ad7a23-96c5-4c23-9007-c0d1892ea07d/images/d4e763f5-67d8-4a73-9c3b-79da330032e5/0033c0b7-69f3-43fa-b36c-8c0a518aea84' (bitmaps:144)
2022-12-07 09:49:47,878+0100 INFO (jsonrpc/2) [storage.bitmaps] Remove bitmap stale-bitmap-1 from '/rhev/data-center/mnt/blockSD/a2ad7a23-96c5-4c23-9007-c0d1892ea07d/images/d4e763f5-67d8-4a73-9c3b-79da330032e5/0033c0b7-69f3-43fa-b36c-8c0a518aea84' (bitmaps:117)
...
Vdsm measures expected data:
FINISH measure return={'result': {'bitmaps': 0, 'required': 2679767040, 'fully-allocated': 2685009920}} from=::ffff:192.168.122.96,38934, flow_id=13305ae3-0084-4eed-9f5a-ec31a43424ff, task_id=b77b670e-79ae-4cad-8d64-71b6259b83b8 (api:37)
And merge finishes correctly, no ERRORs in the logs.
Other disks also have a start prune_bitmaps
, without the bitmaps removal WARN logs:
INFO (jsonrpc/3) [vdsm.api] START prune_bitmaps(sdUUID='a2ad7a23-96c5-4c23-9007-c0d1892ea07d', imgUUID='c7b5c2f6-87ca-4ba6-a498-e12025f0ab12', volUUID='285874a4-0fc6-4bcf-9afc-696f40dd12bc', baseUUID='7f2f1e6e-d32d-4883-941e-836b77c4dde3') from=::ffff:192.168.122.96,38934, flow_id=13305ae3-0084-4eed-9f5a-ec31a43424ff, task_id=55136c39-aacb-4c02-9d8f-17d93a763b5f (api:31)
...
INFO (jsonrpc/3) [vdsm.api] FINISH prune_bitmaps return=None from=::ffff:192.168.122.96,38934, flow_id=13305ae3-0084-4eed-9f5a-ec31a43424ff, task_id=55136c39-aacb-4c02-9d8f-17d93a763b5f (api:37)
Resulting base measure (bitmaps removed):
required size: 2679767040
fully allocated size: 2685009920
bitmaps size: 0
Engine logs:
INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [13305ae3-0084-4eed-9f5a-ec31a43424ff] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_SUCCESS(356), Snapshot 'snap 1' deletion for VM 'internal-livemerge-vm' has been completed.
Also disappeared in the engine UI, and the task finished successfully.
/ost
/ost
/ost
/ost
/ost
/ost
Prune stale bitmaps before a live merge to avoid failing with ENOSPC.
Fixes: #352 Signed-off-by: Albert Esteve aesteve@redhat.com