nix-community / infra

nix-community infrastructure [maintainer=@zowoq]
https://nix-community.org
MIT License
119 stars 81 forks source link

Intermittent cross-comp errors when many builds are queued #1416

Open MattSturgeon opened 3 months ago

MattSturgeon commented 3 months ago

Originally reported as https://github.com/nix-community/buildbot-nix/issues/258

This seems like a infra/nix issue not buildbot-nix issue. @MagicRB https://github.com/nix-community/buildbot-nix/issues/258#issuecomment-2295378308

Issue description

e.g. from this build:

cannot build on 'ssh-ng://nix@build04.nix-community.org': error: failed to start SSH connection to 'nix@build04.nix-community.org'
Failed to find a machine for remote build!
derivation: sjilr3pnp4l1mcl6pfry3yaz4nxyam08-plugins-utils-dashboard.drv
required (system, features): (aarch64-linux, [])
2 available machines:
(systems, maxjobs, supportedFeatures, mandatoryFeatures)
([aarch64-linux], 80, [benchmark, big-parallel, gccarch-armv8-a, kvm, nixos-test], [])
([aarch64-darwin, x86_64-darwin], 8, [big-parallel], [])
error: a 'aarch64-linux' with features {} is required to build '/nix/store/sjilr3pnp4l1mcl6pfry3yaz4nxyam08-plugins-utils-dashboard.drv', but I am a 'x86_64-linux' with features {benchmark, big-parallel, kvm, nixos-test}

This is frustrating, because it usually happens when the load is high, and the only solution is to attempt a re-build of the entire nix-eval, wasting resources further.

zowoq commented 3 months ago

It is an ssh issue, might not occur on the builder(s) with lower max-jobs.

https://github.com/nix-community/infra/pull/1417

Immediate issue should be fixed but I'll leave this issue open for now, I'll see if there is a reasonable default we can set on all the remote builders.