Closed Mic92 closed 1 year ago
https://github.com/nix-community/infra/commit/a96682a55bd00992998ec64f518da49a05b9e6a9
Seems to have been caused by something in nixpkgs, reverting our last flake update resolved the problem.
The nixpkgs diff includes a staging-next merge:
@roberth
I tried updating the flake again but now hercules-ci-api x86_64-linux is failing on nixpkgs master which breaks the agent.
but now hercules-ci-api x86_64-linux is failing on nixpkgs master which breaks the agent.
Yikes, that looks like a corrupted store path in the hercules-ci-api-core
dependency output.
Or a ghc/haskell/... that writes empty files and then succeeds.
I skimmed through the haskell room and noticed that the same error was posted there:
https://app.element.io/#/room/#haskell:nixos.org/$1CSEVH8JOgJ3EYYaL866fLxnW7KZjPK8lZH2vVcWYdM
ghc: mmap 4096 bytes at (nil): Cannot allocate memory
ghc: Try specifying an address with +RTS -xm<addr> -RTS
[1] 1231767 segmentation fault (core dumped) ghci
Seems to be a kernel issue:
https://lore.kernel.org/regressions/20230303201120.kjvrnqi65xll5cqg@revolver/T/
We could potentially downgrade for now:
Linux build04 6.1.25 #1-NixOS SMP Thu Apr 20 10:35:14 UTC 2023 aarch64 GNU/Linux
Linux build02 6.1.25 #1-NixOS SMP PREEMPT_DYNAMIC Thu Apr 20 10:35:14 UTC 2023 x86_64 GNU/Linux
The latest commit that appears to complete the fix would be https://github.com/torvalds/linux/commit/0fa99fdfe1b38da396d0b2d1496a823bcd0ebea0, merged into linux 6.3-rc4. It and related commits been backported and queued for a 6.1.x release https://github.com/gregkh/linux/commit/0608b3da04f5063fe503b7f9287ebb9c9b494fd7 and 6.2.x release https://github.com/gregkh/linux/commit/48c427450711cbc537c9d0b297ea7da9b89d4137.
So it seems that we're waiting for a tag at this point. Until then, an upgrade to 6.3 or some downgrade seems like good workarounds.
I might have put too much faith in those threads.
Reverting linux/58c5d0d6d522112577c7eeb71d382ea642ed7be4 fixes the regression, based on checks.x86_64-linux.agent-function-test
results.
Example NixOS config:
boot.kernelPackages = pkgs.linuxKernel.packages.linux_6_2.extend (self: super: {
kernel = super.kernel.override (o: {
kernelPatches = o.kernelPatches ++ [ { name = "wip"; patch = ./revert-58c5d0d6d522112577c7eeb71d382ea642ed7be4.patch; } ];
});
});
./revert-58c5d0d6d522112577c7eeb71d382ea642ed7be4.patch
Mailing list links (lore.kernel.org)
btw fetchpatch
has a revert
option if you don't want a local copy of the patch
Thank you, I've deployed it on our machines and switched back to 6.1 kernel, seems to be fine.
Upstream merge reached nixpkgs master in https://github.com/NixOS/nixpkgs/pull/233927, backported in https://github.com/NixOS/nixpkgs/pull/234175.
Description
The machine has plenty memory free and hercules-ci-agent crashes right after the start
To Reproduce
Happens on build02.nix-community.org, we can provide access to the machine as needed:
We already try to increase the stack size:
https://github.com/nix-community/infra/pull/546
A crash can be seen here: https://hercules-ci.com/github/nix-community/nix-init/jobs/691
Expected behavior
no crashes.
Logs
Platform / Version
Best to go to https://hercules-ci.com/dashboard and click on the agents' tab for the account you're interested in. hercules-ci-agent --help version
Apr 28 18:53:54 build02 hercules-ci-agent[929782]: [2023-04-28 18:53:54][Agent][Info][build02][PID 929782][ThreadId 23][agent-version:0.9.11][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:115:19] Agent online.