nix-community / nixos-anywhere

install nixos everywhere via ssh [maintainer=@numtide]
https://nix-community.github.io/nixos-anywhere/
MIT License
1.45k stars 100 forks source link

--vm-test fails with ZFS on root after updating lock file #350

Open Sirius902 opened 1 month ago

Sirius902 commented 1 month ago

I have a configuration based on nixos-anywhere-examples here with the only things I've changed being:

With the configuration as-is, nix run github:nix-community/nixos-anywhere -- --flake "path:.#vm" --vm-test succeeds and installing it in a QEMU guest with nix run github:nix-community/nixos-anywhere -- --flake "path:.#vm" root@<ip> also works.

However, after updating the lock file with nix flake update this is not the case. Running the update command will update flake.lock like so.

• Updated input 'disko':
    'github:nix-community/disko/0b178c0554421a6171fc8afb3fb1675511f31377' (2023-09-26)
  → 'github:nix-community/disko/bad376945de7033c7adc424c02054ea3736cf7c4' (2024-07-15)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e12483116b3b51a185a33a272bf351e357ba9a99' (2023-09-21)
  → 'github:NixOS/nixpkgs/9355fa86e6f27422963132c2c9aeedb0fb963d93' (2024-07-16)

Running the command with --vm-test now will result in a failure. The error output is the following.

error: builder for '/nix/store/qqg5a968s0xabipxnz3s01km3g77i321-vm-test-run-disko-nixos-disko.drv' failed with exit code 1;
       last 10 log lines:
       >     driver.run_tests()
       >   File "/nix/store/1jxaawzgwla9qf3ksnzd4a8h0b1ija5n-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/driver.py", line 166, in run_tests
       >     self.test_script()
       >   File "/nix/store/1jxaawzgwla9qf3ksnzd4a8h0b1ija5n-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/driver.py", line 158, in test_script
       >     exec(self.tests, symbols, None)
       >   File "<string>", line 46, in <module>
       >   File "/nix/store/1jxaawzgwla9qf3ksnzd4a8h0b1ija5n-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/machine.py", line 611, in succeed
       >     raise Exception(f"command `{command}` failed (exit code {status})")
       > Exception: command `test -e /mnt/home/testfile` failed (exit code 1)
       > kill vlan (pid 7)
       For full logs, run 'nix log /nix/store/qqg5a968s0xabipxnz3s01km3g77i321-vm-test-run-disko-nixos-disko.drv'.

It is worth noting that installing on the same QEMU guest with the root@<ip> command still works with no errors and the guest successfully boots after the installation. Only running with --vm-test has this issue.

daroot commented 1 month ago

I've run into the same problem. Using a disko config I know worked on both the vm-test and actual hardware as of 2023-12-20 I'm now seeing the same error as shown above, with the critical error in logs being:

vm-test-run-disko-skeleton-disko> machine # + rm -rf /tmp/tmp.Muxv79PDAU
vm-test-run-disko-skeleton-disko> (finished: must succeed: /nix/store/z8838sjswryrjg6rwy1ck2drf184s7qn-disko, in 29.34 seconds)
vm-test-run-disko-skeleton-disko> machine: must succeed: mkdir -p /mnt/home
vm-test-run-disko-skeleton-disko> (finished: must succeed: mkdir -p /mnt/home, in 0.32 seconds)
vm-test-run-disko-skeleton-disko> machine: must succeed: touch /mnt/home/testfile
vm-test-run-disko-skeleton-disko> (finished: must succeed: touch /mnt/home/testfile, in 0.30 seconds)
vm-test-run-disko-skeleton-disko> machine: must succeed: /nix/store/fv7mfyig4sv48lfggx2m6kvy6893dpd0-disko-format
vm-test-run-disko-skeleton-disko> machine # ++ mktemp -d
vm-test-run-disko-skeleton-disko> machine # + disko_devices_dir=/tmp/tmp.9HItfC04BG
vm-test-run-disko-skeleton-disko> machine # + trap 'rm -rf "$disko_devices_dir"' EXIT
vm-test-run-disko-skeleton-disko> machine # + mkdir -p /tmp/tmp.9HItfC04BG
vm-test-run-disko-skeleton-disko> machine # + device=/dev/vdb
vm-test-run-disko-skeleton-disko> machine # + imageSize=2G
vm-test-run-disko-skeleton-disko> machine # + name=main
vm-test-run-disko-skeleton-disko> machine # + type=disk
vm-test-run-disko-skeleton-disko> machine # + device=/dev/vdb
vm-test-run-disko-skeleton-disko> machine # + efiGptPartitionFirst=1
vm-test-run-disko-skeleton-disko> machine # + type=gpt
vm-test-run-disko-skeleton-disko> machine # + blkid /dev/vdb
vm-test-run-disko-skeleton-disko> machine # /dev/vdb: PTUUID="eae70d3c-a8ea-460b-b02e-14b3dfdb8ee5" PTTYPE="gpt"
vm-test-run-disko-skeleton-disko> machine # + sgdisk --align-end --new=1:0:+512M --change-name=1:disk-main-ESP --typecode=1:EF00 /dev/vdb
vm-test-run-disko-skeleton-disko> machine # Could not create partition 1 from 8386560 to 9435135
vm-test-run-disko-skeleton-disko> machine # Error encountered; not saving changes.
vm-test-run-disko-skeleton-disko> machine # + sgdisk --change-name=1:disk-main-ESP --typecode=1:EF00 /dev/vdb
vm-test-run-disko-skeleton-disko> machine # + partprobe /dev/vdb
vm-test-run-disko-skeleton-disko> machine # + udevadm trigger --subsystem-match=block
vm-test-run-disko-skeleton-disko> machine # + udevadm settle
vm-test-run-disko-skeleton-disko> machine # + sgdisk --align-end --new=2:0:-0 --change-name=2:disk-main-rootfs --typecode=2:8300 /dev/vdb
vm-test-run-disko-skeleton-disko> machine # Could not create partition 2 from 8386560 to 8388574
vm-test-run-disko-skeleton-disko> machine # Error encountered; not saving changes.
vm-test-run-disko-skeleton-disko> machine # + sgdisk --change-name=2:disk-main-rootfs --typecode=2:8300 /dev/vdb

In particular, sgdisk does not seem to be doing the right thing, creating a partition at an unusual offset, which makes me think the test is improperly clearing the block device either in nixos-anywhere or disko itself, or something has changed in the behavior of sgdisk (which updated from 1.0.9 to 1.0.10 in this commit NixOS/nixpkgs#297099 )

I'm working on trying to isolate the chain of nixos-anywhere, disko, and gptfdisk to figure out what actually broke.