oVirt / vdsm

The Virtual Desktop Server Manager
GNU General Public License v2.0
160 stars 201 forks source link

cold merge: prune stale bitmaps in base volume #354

Closed aesteve-rh closed 1 year ago

aesteve-rh commented 1 year ago

Prune the stale base volume bitmaps during the prepare step on a merge operation.

These stale bitmaps can cause the merge operation to fail due to 'No space left on device'. In this case, qemu does not end with error, so the failure goes unnoticed.

As there is not a reliable way to measure the size of these stale bitmaps, and they are invalid and can never be used for incremental backup, it is better to prune them to avoid the error.

Related: https://github.com/oVirt/vdsm/issues/352 Signed-off-by: Albert Esteve aesteve@redhat.com

michalskrivanek commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

nirs commented 1 year ago

Measure also the base volume when prepare for a merge. We need to consider base volume bitmaps size, since measuring only top may result in untracked bitmaps even with backing enabled.

This is not very clear, maybe explain that measuring only top does not consider the size required for the bitmaps in the base image, which may result in failing to finish the commit or to copy bitmaps from the top volume.

This can happen if the bitmaps were added to base after the top volume was created.

This cannot happen in the current system. I think the reason is leftover bitmaps that were not deleted when a checkpoint was deleted - known issue in engine.

Considering both base and top volume bitmaps may result in allocating more than needed, but will avoid errors when commiting the image due to 'No space left on device'.

Or when copying bitmaps from top to base after the commit finish.

The expected cases are:

  1. Bitmap is in both top and base - measuring only top is enough
  2. Bitmap is only in top - measuring only top is good enough

The case we see in the related bug is bitmap that existing in base but not in top. These bitmaps are broken and can never be used for backup, but they still take space in the image and can cause live or cold merge to fail.

aesteve-rh commented 1 year ago

Understood. I will update the comments and try to explain it better.

I think the reason is leftover bitmaps that were not deleted when a checkpoint was deleted - known issue in engine.

ouch, yes, that could lead to this issue.

These bitmaps are broken and can never be used for backup, but they still take space in the image and can cause live or cold merge to fail.

More so in the cold merge, as the live merge adds an entire extra chunk to avoid pausing if the guest writes data, which will be enough to "hide" the extra size of the untracked bitmaps.

It is not ideal that we potentially over extend unnecessarily the volume (as in the common case we potentially double the space required for bitmaps), but is better than having many reports due to this buggy scenario occurring.

nirs commented 1 year ago

Testing actual merge code is very complicated and requires too much mocking and faking and is very hard to work with. I tried to reproduce the failure with the tests but could not reproduce it.

I wrote a simple reproducer using qemu-img commands that reproduces the issue, but it proves that the computation of require size using top and base bitmaps is not safe, so this is not the right fix.

This shows that we need more size than the size than top and base bitmaps and top required, but we don't have a good way to compute this value.

Since the issue is stale bitmaps that we cannot use for anything, we can can fix the issue by deleting all bitmaps in base that do not appear in top before the merge before the merge.

See how we find bitmaps in bitmaps._query_bitmaps(). We can add

bitmaps.prune_bitmaps(top, base)

that delete invalid bitmaps from base, and will be used before we measure base in cold and live merge, and maybe later in a new repair_bitmaps vdsm command.

Regardless of the vdsm fix this reproducer reveals issues in qemu-img commit and qemu-img bitmap --merge. The commands must fail if they cannot write bitmaps to storage and they leave them with the "in-use" bit. This is a major failure for qemu-img bitamap --merge since its purpose is to manipulate bitmaps. You can use the reproducer to report qemu-img bug.

Reproducer

# Reproduce ENOSPC error when merging into base image with lot of bitmaps.

import json
import subprocess

IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20

# Testing shows that we can merge successfully with 13 bitmaps, and require
# size calculation is more strict, allowing up to 11 bitamps.
STALE_BITMAPS = 11

def run(*cmd):
    subprocess.run(cmd, check=True)

def output(cmd):
    cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
    return json.loads(cp.stdout)

def info(img):
    return output(["qemu-img", "info", "--output", "json", img])

def measure(img):
    return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])

def check(img):
    cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
    cp = subprocess.run(cmd, stdout=subprocess.PIPE)
    if cp.returncode not in (0, 3):
        raise RuntimeError(f"Check failed")

    return json.loads(cp.stdout)

def indent(info):
    return json.dumps(info, indent=2)

# Configuration - you need to create these lvs and chown them to the user
# running this script.
#   sudo lvcreate --name base.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
#   sudo lvcreate --name top.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"

# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)

print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))

# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
    bitmap = f"stale-bitmap-{i:03d}"
    print(f"Creating stale bitmap {bitmap}")
    run("qemu-img", "bitmap", "--add", base, bitmap)

print("Info base before merge")
base_info = info(base)
print(indent(base_info))

print("Check base before merge")
base_check = check(base)
print(indent(base_check))

print("Measure base before merge")
base_measure = measure(base)
print(indent(base_measure))

print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)

print("Adding good bitmap to top")
run("qemu-img", "bitmap", "--add", top, "good-bitmap")

print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)

print("Info top before merge")
top_info = info(top)
print(indent(top_info))

print("Check top before merge")
top_check = check(top)
print(indent(top_check))

print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))

print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)

print("Add good bitmap to base")
run("qemu-img", "bitmap", "--add", base, "good-bitmap")

print("Merge good bitmap from top to base")
run("qemu-img", "bitmap", "--merge", "good-bitmap", "-F", "qcow2", "-b", top,  base, "good-bitmap")

print("Info base after merge")
print(indent(info(base)))

print("Check base after merge")
print(indent(check(base)))

print("Measure base after merge")
print(indent(measure(base)))

print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"] + base_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE
nirs commented 1 year ago

POC of a fix, using:

After the merge, base should have only the good bitmaps. If we add a merge test we can simulate this case in the test.

# Reproduce ENOSPC error when merging into base image with lot of bitmaps.

import json
import subprocess

IMG_SIZE = 1 << 30
LV_SIZE = 128 << 20

STALE_BITMAPS = 32

def run(*cmd):
    subprocess.run(cmd, check=True)

def output(cmd):
    cp = subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
    return json.loads(cp.stdout)

def info(img):
    return output(["qemu-img", "info", "--output", "json", img])

def measure(img):
    return output(["qemu-img", "measure", "-f", "qcow2", "-O", "qcow2", "--output", "json", img])

def check(img):
    cmd = ["qemu-img", "check", "-f", "qcow2", "--output", "json", img]
    cp = subprocess.run(cmd, stdout=subprocess.PIPE)
    if cp.returncode not in (0, 3):
        raise RuntimeError(f"Check failed")

    return json.loads(cp.stdout)

def bitmaps(img):
    img_info = info(img)
    return {b["name"] for b in img_info["format-specific"]["data"].get("bitmaps", [])}

def indent(info):
    return json.dumps(info, indent=2)

# Configuration - you need to create these lvs and chown them to the user
# running this script.
#   sudo lvcreate --name base.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-base.qcow2
#   sudo lvcreate --name top.qcow2 --size 128m storage
#   sudo chown $USER:$USER /dev/mappe/stoarge-top.qcow2
base = "/dev/mapper/storage-base.qcow2"
top = "/dev/mapper/storage-top.qcow2"

# Start with clean lvs by discarding current data.
run("blkdiscard", base)
run("blkdiscard", top)

print("Creating base")
run("qemu-img", "create", "-f", "qcow2", base, str(IMG_SIZE))

# Simulate stale bitmaps - missing in top.
for i in range(STALE_BITMAPS):
    bitmap = f"stale-bitmap-{i:03d}"
    print(f"Creating stale bitmap {bitmap}")
    run("qemu-img", "bitmap", "--add", base, bitmap)

print("Creating top")
run("qemu-img", "create", "-f", "qcow2", "-b", base, "-F", "qcow2", top)

# Add good bitmaps that will be in both base and top.
for i in range(2):
    bitmap = f"good-bitmap-{i:03d}"
    print(f"Adding good bitmap {bitmap} to base and top")
    run("qemu-img", "bitmap", "--add", base, bitmap)
    run("qemu-img", "bitmap", "--add", top, bitmap)

# Add bitmpas that exists only in top - will be added to based after the merge.
print(f"Adding top only bitmap to top")
run("qemu-img", "bitmap", "--add", top, "top-only")

print("Writing data to top")
cmd = f"write -P {ord('B')} 0 126m"
run("qemu-io", "-f", "qcow2", "-c", cmd, top)

print("Info base before merge")
base_info = info(base)
print(indent(base_info))

print("Info top before merge")
top_info = info(top)
print(indent(top_info))

print("Measure top before merge")
top_measure = measure(top)
print(indent(top_measure))

for b in sorted(bitmaps(base) - bitmaps(top)):
    print(f"Removing stale bitmap {b} from base")
    run("qemu-img", "bitmap", "--remove", base, b)

print("Commit top into base")
run("qemu-img", "commit", "-f", "qcow2", "-t", "none", "-b", base, "-d", "-p", top)

# Merging bitmaps only in top to base.
for b in sorted(bitmaps(top) - bitmaps(base)):
    print("Add top only bitmap to base")
    run("qemu-img", "bitmap", "--add", base, b)
    print("Merge top only bitmap from top to base")
    run("qemu-img", "bitmap", "--merge", b, "-F", "qcow2", "-b", top,  base, b)

print("Info base after merge")
print(indent(info(base)))

print("Check base after merge")
print(indent(check(base)))

print("Measure base after merge")
print(indent(measure(base)))

print("Checking that required size for merge is correct")
required_size = top_measure["required"] + top_measure["bitmaps"]
print(f"required_size: {required_size} lv_size: {LV_SIZE}")
assert required_size <= LV_SIZE
aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

aesteve-rh commented 1 year ago

/ost

michalskrivanek commented 1 year ago

/ost