nix-community / nixops-libvirtd

NixOps libvirtd backend plugin [maintainer=@AmineChikhaoui]
GNU Lesser General Public License v3.0
35 stars 20 forks source link

"No space left on device" kills installation #8

Closed ncryptid closed 4 years ago

ncryptid commented 4 years ago

After manually creating a libvirt pool to work around #7 for the purposes of testing, I'm currently able to create a trivial nixops libvirt deployment. However, I'm running into this error after the VM finishes installing and reboots:

example> [   10.364024] reboot: Power down
example> uploading disk image...
libvirt: I/O Stream Utils error : cannot write to stream: No space left on device
Traceback (most recent call last):
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/bin/.nixops-wrapped", line 251, in <module>
    args.op(args)
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/script_defs.py", line 427,
in op_deploy
    max_concurrent_activate=args.max_concurrent_activate)
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 1062,
in deploy
    self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 1051,
in run_with_notify
    f()
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 1062,
in <lambda>
    self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 999, i
n _deploy
    nixops.parallel.run_tasks(nr_workers=-1, tasks=self.active_resources.itervalues(), worker_fun=worker)
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/parallel.py", line 44, in t
hread_fun
    result_queue.put((worker_fun(t), None, t.name))
  File "/nix/store/s8jbqmv841dwpj6cq40rq1c4qyzk2x7q-nixops-1.8pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 972, i
n worker
    r.create(self.definitions[r.name], check=check, allow_reboot=allow_reboot, allow_recreate=allow_recreate)
  File "/nix/store/s6an2jrd2m2s1rchd65934bvbfyrjnwh-nixops-libvirtd/lib/python2.7/site-packages/nixopsvirtd/backends/libvirtd.py", line
163, in create
    self._prepare_storage_volume()
  File "/nix/store/s6an2jrd2m2s1rchd65934bvbfyrjnwh-nixops-libvirtd/lib/python2.7/site-packages/nixopsvirtd/backends/libvirtd.py", line
200, in _prepare_storage_volume
    self._upload_volume(temp_disk_path, image_info['actual-size'])
  File "/nix/store/s6an2jrd2m2s1rchd65934bvbfyrjnwh-nixops-libvirtd/lib/python2.7/site-packages/nixopsvirtd/backends/libvirtd.py", line
233, in _upload_volume
    stream.sendAll(read_file, f)
  File "/nix/store/p9d10c55vc0gb8gzvgrqhryaxx1xahhw-python2.7-libvirt-5.4.0/lib/python2.7/site-packages/libvirt.py", line 6021, in sendA
ll
    ret = self.send(got)
  File "/nix/store/p9d10c55vc0gb8gzvgrqhryaxx1xahhw-python2.7-libvirt-5.4.0/lib/python2.7/site-packages/libvirt.py", line 6058, in send
    if ret == -1: raise libvirtError ('virStreamSend() failed')
libvirt.libvirtError: cannot write to stream: No space left on device

I've got 2TB of storage on this pool, and according to virsh it's not using anywhere close to the total amount of space that's been allocated for the VM image:

Name:           dev_images
UUID:           79e9ba14-e73c-42d1-ab71-6056edb5f60c
State:          running
Persistent:     yes
Autostart:      yes
Capacity:       2.57 TiB
Allocation:     685.62 MiB
Available:      2.57 TiB

Thanks

onixie commented 4 years ago

I had a similar issue on Ubuntu. It was caused by a size-limited user runtime directory, which usually resides under /run/user/.

As you can see, the size of directory /run/user/1001 is only 3.2G for my case (My machine has 32GB memory in total.)

$ df -h -t tmpfs
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  1.6M  3.2G   1% /run
tmpfs            16G   53M   16G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
tmpfs           3.2G   40K  3.2G   1% /run/user/1001

Since libvirtd backend stores temporary disk images under this directory, it can be out of disk during the deployment if we have a bigger images to build.

A workaround is to reserve a bigger size for the user runtime directory. The configuration on Ubuntu (or systemd-based Linux OS) is as below:

$ cat /etc/systemd/logind.conf 
[Login]
RuntimeDirectorySize=30% # default is 10% 

Hope this can fix your issue.

grahamc commented 4 years ago
virt-admin daemon-log-filters ""
virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log"

and replicating the error got me:

2020-06-01 18:30:56.481+0000: 18598: error : virNetSocketReadWire:1829 : End of file while reading data: Input/output error
2020-06-01 18:31:23.491+0000: 18598: error : virFDStreamWrite:782 : cannot write to stream: No space left on device

except, this is NOT an OS error:

https://github.com/libvirt/libvirt/blob/ab55a8a0871207de5fe194f55cbbcecae7a3cfe9/src/util/virfdstream.c#L780-L783

grahamc commented 4 years ago

qemu-img reports the wrong file size if the underlying storage system has compression:

[nix-shell:~/projects/github.com/grahamc/tinkerbell-nixos]$ qemu-img info --output json /nix/store/fcq2qdlh31snwdlipwv6b97banrd4z9q-libvirtd-ssh-image/nixos.qcow2  | jq '."actual-size"'
558619648

[nix-shell:~/projects/github.com/grahamc/tinkerbell-nixos]$ du /nix/store/fcq2qdlh31snwdlipwv6b97banrd4z9q-libvirtd-ssh-image/nixos.qcow2
545527  /nix/store/fcq2qdlh31snwdlipwv6b97banrd4z9q-libvirtd-ssh-image/nixos.qcow2

[nix-shell:~/projects/github.com/grahamc/tinkerbell-nixos]$ echo $((558619648 / 545527))
1024

[nix-shell:~/projects/github.com/grahamc/tinkerbell-nixos]$ du -h --apparent-size /nix/store/fcq2qdlh31snwdlipwv6b97banrd4z9q-libvirtd-ssh-image/nixos.qcow2
1.3G    /nix/store/fcq2qdlh31snwdlipwv6b97banrd4z9q-libvirtd-ssh-image/nixos.qcow2
grahamc commented 4 years ago

Reported upstream https://bugs.launchpad.net/qemu/+bug/1881648

and with my diff I've moved past:

diff --git a/nixops_virtd/backends/libvirtd.py b/nixops_virtd/backends/libvirtd.py
index 43476b4..ed41a8c 100644
--- a/nixops_virtd/backends/libvirtd.py
+++ b/nixops_virtd/backends/libvirtd.py
@@ -267,7 +267,14 @@ class LibvirtdState(MachineState[LibvirtdDefinition]):
         output = self._logged_exec(
             ["qemu-img", "info", "--output", "json", filename], capture_stdout=True
         )
-        return json.loads(output)
+
+        du_output = self._logged_exec(
+            ["du", "--bytes", "--apparent-size", filename], capture_stdout=True
+        ).split()
+
+        mid = json.loads(output)
+        mid['actual-size'] = int(du_output[0])
+        return mid

     def _create_volume(self, virtual_size, actual_size):
         xml = """
teto commented 4 years ago

ok that seemed tricky. Congrats !