Closed aesteve-rh closed 1 year ago
/ost
/ost
Measure also the base volume when prepare for a merge. We need to consider base volume bitmaps size, since measuring only top may result in untracked bitmaps even with backing enabled.
This is not very clear, maybe explain that measuring only top does not consider the size required for the bitmaps in the base image, which may result in failing to finish the commit or to copy bitmaps from the top volume.
This can happen if the bitmaps were added to base after the top volume was created.
This cannot happen in the current system. I think the reason is leftover bitmaps that were not deleted when a checkpoint was deleted - known issue in engine.
Considering both base and top volume bitmaps may result in allocating more than needed, but will avoid errors when commiting the image due to 'No space left on device'.
Or when copying bitmaps from top to base after the commit finish.
The expected cases are:
The case we see in the related bug is bitmap that existing in base but not in top. These bitmaps are broken and can never be used for backup, but they still take space in the image and can cause live or cold merge to fail.
Understood. I will update the comments and try to explain it better.
I think the reason is leftover bitmaps that were not deleted when a checkpoint was deleted - known issue in engine.
ouch, yes, that could lead to this issue.
These bitmaps are broken and can never be used for backup, but they still take space in the image and can cause live or cold merge to fail.
More so in the cold merge, as the live merge adds an entire extra chunk to avoid pausing if the guest writes data, which will be enough to "hide" the extra size of the untracked bitmaps.
It is not ideal that we potentially over extend unnecessarily the volume (as in the common case we potentially double the space required for bitmaps), but is better than having many reports due to this buggy scenario occurring.
Testing actual merge code is very complicated and requires too much mocking and faking and is very hard to work with. I tried to reproduce the failure with the tests but could not reproduce it.
I wrote a simple reproducer using qemu-img commands that reproduces the issue, but it proves that the computation of require size using top and base bitmaps is not safe, so this is not the right fix.
Running the reproducer with 5 stale bitmaps the required size is less than the lv size, the merge succeeds, all bitmaps are valid, merging the bitmaps from top succeeds, and the good bitmap is valid.
Running the reproducer with 6 stale bitmaps the required size is less than the lv size, the merge succeeds, merging the bitmaps from top succeeds, but the good bitmap is invalid.
Commit top into base
(100.00/100%)
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block159': Failed to write bitmap 'good-bitmap' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Info base after merge
{
"virtual-size": 1073741824,
"filename": "/dev/mapper/storage-base.qcow2",
"cluster-size": 65536,
"format": "qcow2",
"actual-size": 0,
"format-specific": {
"type": "qcow2",
"data": {
"compat": "1.1",
"compression-type": "zlib",
"lazy-refcounts": false,
"bitmaps": [
...
{
"flags": [
"in-use",
"auto"
],
"name": "good-bitmap",
"granularity": 65536
}
],
"refcount-bits": 16,
"corrupt": false,
"extended-l2": false
}
},
"dirty-flag": false
}
Running the reproducer with 11 stale bitmaps, the required size is exactly the lv size and the merge succeeds, merging the good bitmap success, and the good bitmap from top is valid.
Commit top into base
(100.00/100%)
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block308': Failed to write bitmap 'stale-bitmap-002' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
Info base after merge
{
"virtual-size": 1073741824,
"filename": "/dev/mapper/storage-base.qcow2",
"cluster-size": 65536,
"format": "qcow2",
"actual-size": 0,
"format-specific": {
"type": "qcow2",
"data": {
"compat": "1.1",
"compression-type": "zlib",
"lazy-refcounts": false,
"bitmaps": [
...
{
"flags": [
"auto"
],
"name": "good-bitmap",
"granularity": 65536
}
],
Running the reproducer with 13 bitmaps, the required size is more than the lv size, but the merge succeeds. The good bitmap is invalid.
Commit top into base
(100.00/100%)
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block326': Failed to write bitmap 'stale-bitmap-000' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device
Image committed.
Add good bitmap to base
Merge good bitmap from top to base
Info base after merge
{
"virtual-size": 1073741824,
"filename": "/dev/mapper/storage-base.qcow2",
"cluster-size": 65536,
"format": "qcow2",
"actual-size": 0,
"format-specific": {
"type": "qcow2",
"data": {
"compat": "1.1",
"compression-type": "zlib",
"lazy-refcounts": false,
"bitmaps": [
...
{
"flags": [
"auto"
],
"name": "good-bitmap",
"granularity": 65536
}
],
Running the reproducer with 14 bitmaps the merge fails with ENOSPC.
This shows that we need more size than the size than top and base bitmaps and top required, but we don't have a good way to compute this value.
Since the issue is stale bitmaps that we cannot use for anything, we can can fix the issue by deleting all bitmaps in base that do not appear in top before the merge before the merge.
See how we find bitmaps in bitmaps._query_bitmaps(). We can add
bitmaps.prune_bitmaps(top, base)
that delete invalid bitmaps from base, and will be used before we measure base in cold and live merge, and maybe later in a new repair_bitmaps vdsm command.
Regardless of the vdsm fix this reproducer reveals issues in qemu-img commit
and
qemu-img bitmap --merge
. The commands must fail if they cannot write bitmaps
to storage and they leave them with the "in-use" bit. This is a major failure for
qemu-img bitamap --merge
since its purpose is to manipulate bitmaps. You can use
the reproducer to report qemu-img bug.
# Reproduce ENOSPC error when merging into base image with lot of bitmaps.
import json
import subprocess
IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20
# Testing shows that we can merge successfully with 13 bitmaps, and require
# size calculation is more strict, allowing up to 11 bitamps.
STALE_BITMAPS = 11
def run(*cmd):
subprocess.run(cmd, check=True)
def output(cmd):
cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
return json.loads(cp.stdout)
def info(img):
return output(["qemu-img", "info", "--output", "json", img])
def measure(img):
return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])
def check(img):
cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
cp = subprocess.run(cmd, stdout=subprocess.PIPE)
if cp.returncode not in (0, 3):
raise RuntimeError(f"Check failed")
return json.loads(cp.stdout)
def indent(info):
return json.dumps(info, indent=2)
# Configuration - you need to create these lvs and chown them to the user
# running this script.
# sudo lvcreate --name base.qcow2 --size 128m storage
# sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
# sudo lvcreate --name top.qcow2 --size 128m storage
# sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"
# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)
print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))
# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
bitmap = f"stale-bitmap-{i:03d}"
print(f"Creating stale bitmap {bitmap}")
run("qemu-img", "bitmap", "--add", base, bitmap)
print("Info base before merge")
base_info = info(base)
print(indent(base_info))
print("Check base before merge")
base_check = check(base)
print(indent(base_check))
print("Measure base before merge")
base_measure = measure(base)
print(indent(base_measure))
print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)
print("Adding good bitmap to top")
run("qemu-img", "bitmap", "--add", top, "good-bitmap")
print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)
print("Info top before merge")
top_info = info(top)
print(indent(top_info))
print("Check top before merge")
top_check = check(top)
print(indent(top_check))
print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))
print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)
print("Add good bitmap to base")
run("qemu-img", "bitmap", "--add", base, "good-bitmap")
print("Merge good bitmap from top to base")
run("qemu-img", "bitmap", "--merge", "good-bitmap", "-F", "qcow2", "-b", top, base, "good-bitmap")
print("Info base after merge")
print(indent(info(base)))
print("Check base after merge")
print(indent(check(base)))
print("Measure base after merge")
print(indent(measure(base)))
print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"] + base_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE
POC of a fix, using:
After the merge, base should have only the good bitmaps. If we add a merge test we can simulate this case in the test.
# Reproduce ENOSPC error when merging into base image with lot of bitmaps.
import json
import subprocess
IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20
STALE_BITMAPS = 32
def run(*cmd):
subprocess.run(cmd, check=True)
def output(cmd):
cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
return json.loads(cp.stdout)
def info(img):
return output(["qemu-img", "info", "--output", "json", img])
def measure(img):
return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])
def check(img):
cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
cp = subprocess.run(cmd, stdout=subprocess.PIPE)
if cp.returncode not in (0, 3):
raise RuntimeError(f"Check failed")
return json.loads(cp.stdout)
def bitmaps(img):
img_info = info(img)
return {b["name"] for b in img_info["format-specific"]["data"].get("bitmaps", [])}
def indent(info):
return json.dumps(info, indent=2)
# Configuration - you need to create these lvs and chown them to the user
# running this script.
# sudo lvcreate --name base.qcow2 --size 128m storage
# sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
# sudo lvcreate --name top.qcow2 --size 128m storage
# sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"
# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)
print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))
# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
bitmap = f"stale-bitmap-{i:03d}"
print(f"Creating stale bitmap {bitmap}")
run("qemu-img", "bitmap", "--add", base, bitmap)
print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)
# Add good bitmaps that will be in both base and top.
for i in range(2):
bitmap = f"good-bitmap-{i:03d}"
print(f"Adding good bitmap {bitmap} to base and top")
run("qemu-img", "bitmap", "--add", base, bitmap)
run("qemu-img", "bitmap", "--add", top, bitmap)
# Add bitmpas that exists only in top - will be added to based after the merge.
print(f"Adding top only bitmap to top")
run("qemu-img", "bitmap", "--add", top, "top-only")
print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)
print("Info base before merge")
base_info = info(base)
print(indent(base_info))
print("Info top before merge")
top_info = info(top)
print(indent(top_info))
print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))
for b in sorted(bitmaps(base) - bitmaps(top)):
print(f"Removing stale bitmap {b} from base")
run("qemu-img", "bitmap", "--remove", base, b)
print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)
# Merging bitmaps only in top to base.
for b in sorted(bitmaps(top) - bitmaps(base)):
print("Add top only bitmap to base")
run("qemu-img", "bitmap", "--add", base, b)
print("Merge top only bitmap from top to base")
run("qemu-img", "bitmap", "--merge", b, "-F", "qcow2", "-b", top, base, b)
print("Info base after merge")
print(indent(info(base)))
print("Check base after merge")
print(indent(check(base)))
print("Measure base after merge")
print(indent(measure(base)))
print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE
/ost
/ost
/ost
/ost
/ost
/ost
/ost
/ost
/ost
/ost
Prune the stale base volume bitmaps during the prepare step on a merge operation.
These stale bitmaps can cause the merge operation to fail due to 'No space left on device'. In this case, qemu does not end with error, so the failure goes unnoticed.
As there is not a reliable way to measure the size of these stale bitmaps, and they are invalid and can never be used for incremental backup, it is better to prune them to avoid the error.
Related: https://github.com/oVirt/vdsm/issues/352 Signed-off-by: Albert Esteve aesteve@redhat.com