lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
15.13k stars 591 forks source link

vz: "Expanding to 100GiB" phase makes a non-sparse 100GiB image #2713

Open AkihiroSuda opened 2 days ago

AkihiroSuda commented 2 days ago

I just found out that the built-in conversion needs more diskspace than qemu-img convert. While the end-result is still a sparse disk, it seems to require the full 100GB disk space temporarily, so you cannot convert from QCOW2 to RAW on a device with limited free space.

$ df -h ~/.lima3
Filesystem    Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk5    50Gi   692Mi    49Gi     2%      11  4.3G    0%   /Users/jan/.lima3

$ l start --vm-type vz
? Creating an instance "default" Proceed with the current configuration
INFO[0001] Starting the instance "default" with VM driver "vz"
…
INFO[0002] Converting "/Users/jan/.lima3/default/basedisk" (qcow2) to a raw disk "/Users/jan/.lima3/default/diffdisk"
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 206.87 MiB/s
INFO[0019] Expanding to 100GiB
FATA[0020] failed to convert "/Users/jan/.lima3/default/basedisk" to a raw disk "/Users/jan/.lima3/default/diffdisk": no space left on device

Using qemu-img convert seems to require little extra space beyond what the new sparse file actually occupies.

Originally posted by @jandubois in https://github.com/lima-vm/lima/issues/2579#issuecomment-2403566144

While the end-result is still a sparse disk

Actually, it is not, with the builtin conversion. It turns into a fully allocated disk. So this is even worse. That also might explain why it takes so long: it possibly writes the full 100GB to disk.

Originally posted by @jandubois in https://github.com/lima-vm/lima/issues/2579#issuecomment-2403587553


https://github.com/lima-vm/lima/blob/bc774ded0a0710acb6716580719ace284d9ec3d4/pkg/nativeimgutil/nativeimgutil.go#L85-L91 https://github.com/lima-vm/lima/blob/bc774ded0a0710acb6716580719ace284d9ec3d4/pkg/nativeimgutil/nativeimgutil.go#L170-L175

AkihiroSuda commented 2 days ago

We may exec /usr/bin/truncate https://github.com/apple-oss-distributions/file_cmds/blob/file_cmds-448.0.3/truncate/truncate.c

AkihiroSuda commented 2 days ago

Looks like the fd has to be closed before ftruncating

Maybe this behavior is specific to macOS or APFS

nirs commented 1 day ago

I cannot reproduce this:

% limactl create --plain --tty=false
INFO[0000] Terminal is not available, proceeding without opening an editor 
INFO[0000] Attempting to download the image              arch=aarch64 digest="sha256:5ecac6447be66a164626744a87a27fd4e6c6606dc683e0a233870af63df4276a" location="https://cloud-images.ubuntu.com/releases/24.04/release-20240821/ubuntu-24.04-server-cloudimg-arm64.img"
INFO[0000] Using cache "/Users/nsoffer/Library/Caches/lima/download/by-url-sha256/346ee1ff9e381b78ba08e2a29445960b5cd31c51f896fc346b82e26e345a5b9a/data" 
INFO[0000] Converting "/Users/nsoffer/.lima/default/basedisk" (qcow2) to a raw disk "/Users/nsoffer/.lima/default/diffdisk" 
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 197.48 MiB/s
INFO[0018] Expanding to 100GiB                          
INFO[0018] Run `limactl start default` to start the instance. 

% qemu-img info /Users/nsoffer/Library/Caches/lima/download/by-url-sha256/346ee1ff9e381b78ba08e2a29445960b5cd31c51f896fc346b82e26e345a5b9a/data
image: /Users/nsoffer/Library/Caches/lima/download/by-url-sha256/346ee1ff9e381b78ba08e2a29445960b5cd31c51f896fc346b82e26e345a5b9a/data
file format: qcow2
virtual size: 3.5 GiB (3758096384 bytes)
disk size: 551 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
Child node '/file':
    filename: /Users/nsoffer/Library/Caches/lima/download/by-url-sha256/346ee1ff9e381b78ba08e2a29445960b5cd31c51f896fc346b82e26e345a5b9a/data
    protocol type: file
    file length: 551 MiB (578093056 bytes)
    disk size: 551 MiB

% qemu-img info ~/.lima/default/basedisk
image: /Users/nsoffer/.lima/default/basedisk
file format: qcow2
virtual size: 3.5 GiB (3758096384 bytes)
disk size: 551 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
Child node '/file':
    filename: /Users/nsoffer/.lima/default/basedisk
    protocol type: file
    file length: 551 MiB (578093056 bytes)
    disk size: 551 MiB

% qemu-img info ~/.lima/default/diffdisk 
image: /Users/nsoffer/.lima/default/diffdisk
file format: raw
virtual size: 100 GiB (107374182400 bytes)
disk size: 1.72 GiB
Child node '/file':
    filename: /Users/nsoffer/.lima/default/diffdisk
    protocol type: file
    file length: 100 GiB (107374182400 bytes)
    disk size: 1.72 GiB

The source disk is using qcow2 image with compressed clusters, expanding 551 MiB to 1.72 GiB is expected.

Verifying with qemu-img convert:

% qemu-img convert -f qcow2 -O raw ~/.lima/default/basedisk ~/.lima/default/disk.img

% qemu-img resize -f raw ~/.lima/default/disk.img 100g
Image resized.

% qemu-img info ~/.lima/default/disk.img                                         
image: /Users/nsoffer/.lima/default/disk.img
file format: raw
virtual size: 100 GiB (107374182400 bytes)
disk size: 1.62 GiB
Child node '/file':
    filename: /Users/nsoffer/.lima/default/disk.img
    protocol type: file
    file length: 100 GiB (107374182400 bytes)
    disk size: 1.62 GiB

% qemu-img compare ~/.lima/default/disk.img ~/.lima/default/diffdisk                
Images are identical.

qemu-img does better sparsifying - it is detecting zeros in 4k granularity while the lima native convertor treat larger blocks with one non zero byte as data.

@jandubois which file system are you using? OS version?

AkihiroSuda commented 1 day ago

The issue should be reproducible with https://raw.githubusercontent.com/lima-vm/lima/refs/tags/v0.23.2/examples/alpine-image.yaml (macOS 15, Intel, APFS)

nirs commented 1 day ago

Maybe macOS 15 regression then?

nirs commented 1 day ago

@jandubois, @AkihiroSuda Is this reproducible with #2718 ?

This is an old patch that I tried when I looked at the slow convert. It does not improve convert speed but it simplifies the flow and removing unneeded work.

jandubois commented 1 day ago

Maybe macOS 15 regression then?

I became aware of the problem in the RAM disk PR where it failed with the default template on macOS 13 on Intel. I don't know which filesystem the GitHub runners are using.

I've reproduced it on macOS 15 on M1 using APFS.

I do think that this used to work, which is why I claimed (incorrectly) at first that the result was still a sparse file. But it wasn't.

I've since created a 50GB RAM disk (see this issue itself), and couldn't create a VZ instance in it. I've created a 40GB instance and verified with du -h diffdisk that it was indeed using 40GB of disk space.

This is not related to Alpine; I did all my tests with the default (and fedora) templates.