The tricky bit about it was that it wouldn't work for the GPU server G1-30 or G1-15 I wanted to use directly (it would fail to boot because it couldn't see any hard drives, probably because for some weird reason the kernel module virtio_scsi didn't get loaded or so).
The error looks like this (in OVH's VNC console):
loading module dm_mod . . .
running udev...
starting device mapper and LVM...
mount: mounting /dev/sda1 on /mnt-root/old-root failed: No such file or directory
waiting for device /mnt-root/old-root/nixos to appear.....................
mounting /mnt/root/old-root/nixos on /...
mount: mounting /mnt-root/old-root/nixos on /mnt-root/ failed: No such file or directory
An error occurred in stage 1 of the boot process
The installation config looked like this:
$ git rev-parse HEAD
8760ff58fa266d30b2175404134566218723e32a
$ sudo ./install
>>> Checking environment... seems sane
>>> NixOS installer (nixos-in-place)
>>> GRUB => /dev/sda
>>> Root => /dev/sda1 (ext4)
>>> ISO => nixos-minimal-16.09.680.4e14fd5-x86_64-linux.iso
>>> Digital Ocean => false
>>> Working directory => /tmp/tmp.am4DHd99F7
>>> Extra config => /home/ubuntu/nixos-in-place/no-extra-config
>>> Continue? [yn] y
I worked around it by using it on the non-GPU instance R2-30, and then making a disk snapshot of that and booting the GPU servers from it. An alternative that also worked was to do the same but not make a snapshot and use OVH's Change the server type functionality instead.
If just naively deploying that with nixops then, the server would fail to boot again (and also have its sshd not started). I solved it by including the grub and kernel related stuff that nixos-in-place generated into /etc/nixos/nixos-in-place.nix and /etc/nixos/hardware-configuration.nix into our nixops config.
With that in place, everything seems to be working and also nvidia-smi (which is a good test to gauge whether CUDA will work on these GPU servers).
I believe the reason for the issue is the generated /nixos/etc/nixos/hardware-configuration.nix.
The one that gets generated on the GPU server and that doesn't work contains:
imports = [ ];
The one that gets generated on the non-GPU server and that works contains:
The tricky bit about it was that it wouldn't work for the GPU server G1-30 or G1-15 I wanted to use directly (it would fail to boot because it couldn't see any hard drives, probably because for some weird reason the kernel module
virtio_scsi
didn't get loaded or so).The error looks like this (in OVH's VNC console):
The installation config looked like this:
I worked around it by using it on the non-GPU instance R2-30, and then making a disk snapshot of that and booting the GPU servers from it. An alternative that also worked was to do the same but not make a snapshot and use OVH's
Change the server type
functionality instead.If just naively deploying that with
nixops
then, the server would fail to boot again (and also have its sshd not started). I solved it by including the grub and kernel related stuff thatnixos-in-place
generated into/etc/nixos/nixos-in-place.nix
and/etc/nixos/hardware-configuration.nix
into our nixops config.With that in place, everything seems to be working and also
nvidia-smi
(which is a good test to gauge whether CUDA will work on these GPU servers).I believe the reason for the issue is the generated
/nixos/etc/nixos/hardware-configuration.nix
.The one that gets generated on the GPU server and that doesn't work contains:
The one that gets generated on the non-GPU server and that works contains:
The entire
/nixos/etc/nixos/hardware-configuration.nix
in the working case is:Why might
nixos-in-place
, or the nixos installer (nixos-generate-config
), have generated different configs here?