nix-community / disko

Declarative disk partitioning and formatting using nix [maintainers=@Lassulus @Enzime @iFreilicht]
MIT License
1.73k stars 190 forks source link

xcp exhausts memory, probably avoidably #769

Open colemickens opened 3 weeks ago

colemickens commented 3 weeks ago
  1. I have to bump memsize to build a disk image with a moderate (but not huge) closure.
  2. I am surprised I don't hit cmdline arg length limits.
  3. I think that using something other than xcp with ALL store paths at once, would avoid needing a larger guest just to install the closure.
Mic92 commented 3 weeks ago

disko-images uses cp not xcp.

Mic92 commented 3 weeks ago

Did you maybe use an older version of disko? Because we are using xargs + cp: https://github.com/nix-community/disko/blob/624fd86460e482017ed9c3c3c55a3758c06a4e7f/lib/make-disk-image.nix#L91

colemickens commented 2 weeks ago

I think I just crossed wires. I am definitely using recent disko. However, I'm still suspicious that we can somehow batch/chunk around xargs cp.

It's very apparent that this copy step uses a LOT of memory, it's where my image build fails every time with a large enough closure and not enough VM ram.

Obviously it's easy to just bump the builder ram, but again, I'm guessing that this can be massaged to not use so much RAM for the copy.

iFreilicht commented 2 weeks ago

Hmm, I'm not sure why this happens. I can definitely confirm that:

  1. running nom build .#checks.x86_64-linux.make-disk-image causes the qemu process to use up to 2.1GB of RAM
  2. before the line nixos-disko-images> ++ xargs cp --recursive --target /mnt/nix/store, the RAM usage was less than 1GB

However, I also observed:

  1. After xargs cp is done, the RAM usage of qemu does not decrease

Additionally, I tried to reproduce this by running this copy operation on the exact same closure-info locally:

$ cat ./xargs-cp.sh
#!/usr/bin/env bash
xargs cp --recursive --target /mnt/scratch/test-store < "$1/store-paths"
$ nix run nixpkgs#time -- -v ./xargs-cp.sh /nix/store/a3s32wbdg5yain492c3gq8fbv9aak6vd-closure-info
        Command being timed: "./xargs-cp.sh /nix/store/a3s32wbdg5yain492c3gq8fbv9aak6vd-closure-info"
        User time (seconds): 0.25
        System time (seconds): 2.14
        Percent of CPU this job got: 56%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.25
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3712
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 142739
        Voluntary context switches: 17875
        Involuntary context switches: 178
        Swaps: 0
        File system inputs: 851304
        File system outputs: 1552184
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

You can see Maximum resident set size (kbytes): 3712, meaning the peak memory usage of this operation was 3.7MB.

I think the issue is that we're using a tmpfs for storage of the VM, which resides in memory. We run the vmTools.runInLinuxVM function, which says

By default, there is no disk image; the root filesystem is a tmpfs, and the Nix store is shared with the host (via the 9P protocol). Thus, any pure Nix derivation should run unmodified.

As the command copies to ${systemToInstall.config.disko.rootMountPoint}/nix/store, it copies to a memory-backed virtual file.

The solution would be to pass a diskImage argument. We don't have that implemented right now, but basically it would be the same as memSize.

However, I'm not entirely sure that's what you're complaining about. If setting memSize fixes your issue, that means the memory is exhausted inside the VM itself, not on the builder.

  1. I am surprised I don't hit cmdline arg length limits.

That's to be expected, xargs is desigend to work around this limitation. From xarg's manpage:

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored.

The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). The specified command will be invoked as many times as necessary to use up the list of input items. In general, there will be many fewer invocations of command than there were items in the input.

Mic92 commented 2 weeks ago

Those disks should be actually not in tmpfs of the virtual machine, because we add them from the build directory via virtio-blk as block devices: https://github.com/nix-community/disko/blob/51994df8ba24d5db5459ccf17b6494643301ad28/lib/make-disk-image.nix#L102

I did try debug kernel to see where the memory usage is coming from but it is still not super clear to me. For zfs it seemed to be zfs internal memory allocations that added up.