jeaye / nixos-in-place

Install NixOS on top of any existing Linux distribution without rebooting
MIT License
460 stars 57 forks source link

Experience report of using nixos-in-place for OVH cloud servers and qemu-guest issue #36

Open nh2 opened 7 years ago

nh2 commented 7 years ago

The tricky bit about it was that it wouldn't work for the GPU server G1-30 or G1-15 I wanted to use directly (it would fail to boot because it couldn't see any hard drives, probably because for some weird reason the kernel module virtio_scsi didn't get loaded or so).

The error looks like this (in OVH's VNC console):

screenshot from 2017-07-20 21-20-48

loading module dm_mod . . .
running udev...
starting device mapper and LVM...
mount: mounting /dev/sda1 on /mnt-root/old-root failed: No such file or directory 
waiting for device /mnt-root/old-root/nixos to appear.....................
mounting /mnt/root/old-root/nixos on /...
mount: mounting /mnt-root/old-root/nixos on /mnt-root/ failed: No such file or directory 

An error occurred in stage 1 of the boot process

The installation config looked like this:

$ git rev-parse HEAD
8760ff58fa266d30b2175404134566218723e32a
$ sudo ./install 
>>> Checking environment... seems sane
>>> NixOS installer (nixos-in-place)
>>>    GRUB => /dev/sda
>>>    Root => /dev/sda1 (ext4)
>>>    ISO => nixos-minimal-16.09.680.4e14fd5-x86_64-linux.iso
>>>    Digital Ocean => false
>>>    Working directory => /tmp/tmp.am4DHd99F7
>>>    Extra config => /home/ubuntu/nixos-in-place/no-extra-config
>>> Continue? [yn] y

I worked around it by using it on the non-GPU instance R2-30, and then making a disk snapshot of that and booting the GPU servers from it. An alternative that also worked was to do the same but not make a snapshot and use OVH's Change the server type functionality instead.

If just naively deploying that with nixops then, the server would fail to boot again (and also have its sshd not started). I solved it by including the grub and kernel related stuff that nixos-in-place generated into /etc/nixos/nixos-in-place.nix and /etc/nixos/hardware-configuration.nix into our nixops config.

With that in place, everything seems to be working and also nvidia-smi (which is a good test to gauge whether CUDA will work on these GPU servers).


I believe the reason for the issue is the generated /nixos/etc/nixos/hardware-configuration.nix.

The one that gets generated on the GPU server and that doesn't work contains:

  imports = [ ];

The one that gets generated on the non-GPU server and that works contains:

  imports =
    [ <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
    ];

The entire /nixos/etc/nixos/hardware-configuration.nix in the working case is:

# Do not modify this file!  It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations.  Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, ... }:

{
  imports =
    [ <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
    ];

  boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "virtio_pci" ];
  boot.kernelModules = [ "kvm-intel" ];
  boot.extraModulePackages = [ ];

  swapDevices = [ ];

  nix.maxJobs = lib.mkDefault 2;
}

Why might nixos-in-place, or the nixos installer (nixos-generate-config), have generated different configs here?