Open michaelpj opened 1 year ago
Worth mentioning that these builds do IFD, no idea how that plays out with remote store builds.
Thanks for reporting this!
- "don't know how to build these paths"
This is normal behavior. Nix outputs this locally before kicking of the build remotely. It is a bit weird and should be fixed in Nix.
- "dependency failed"
Some input of the target derivation failed to build. Unfortunately, the build that actually failed is sometimes not shown. This is most probably a bug in nixbuild.net, although I'm unsure how vanilla Nix behaves, I think you could get missing logs there too since it only show the last lines of failing builds. Issue #6 is related to this, some research and testing is needed to make nixbuild.net match (or improve) the behavior of plain Nix. A workaround you can use is to add --print-build-logs
. This will make all logs visible in the output, which can help you pinpoint the failure (after some scrolling...).
error: non-zero padding
This is something I haven't seen before. I assume it is your local Nix that prints that message. Maybe nixbuild.net sends something incorrect. What version of Nix are you using?
What version of Nix are you using?
Not sure, I was using the cachix install-nix action. I bumped the version of that and that seems to have helped, thanks.
Unfortunately, the build that actually failed is sometimes not shown. I'm unsure how vanilla Nix behaves, I think you could get missing logs there too since it only show the last lines of failing builds.
After updating my install-nix action version, which presumably got me a newer Nix, I got a much better error:
error: [nixbuild.net] '/nix/store/i22fvb148z1z4kp421lslna1nsaxb36h-plutus-core-1.1.1.0.drv': dependency failed
'/nix/store/w8rfhrj8iwwhs3j739iqvbc5drvvmjr0-plutus-core-exe-cost-model-budgeting-bench-1.1.1.0.drv': dependency failed
'/nix/store/62yv7ln16y8z7aslfmmv28iv79m7639x-cardano-crypto-class-lib-cardano-crypto-class-2.0.0.1.drv': build failed: Cached build failure: builder for '/nix/store/62yv7ln16y8z7aslfmmv28iv79m7639x-cardano-crypto-class-lib-cardano-crypto-class-2.0.0.1.drv' failed with exit code 1
That does actually tell me what failed! I still would really like to see the logs from the failing derivation, I guess I'll use -L
for now.
@michaelpj I think, even with the newer Nix version, you could get the "dependency failed" issue, because it is nixbuild.net itself that outputs those logs, and there is some bug that causes it to some times not show the actual build that failed.
What Nix does in this situation is to print the last 10 lines of the build that failed. However, sometimes that is not enough to find the error. And in the case of remote-store building it is actually not possible to show more lines unless you change the Nix config on the remote machine itself.
What we plan on doing in nixbuild.net is to first of all fix the bugs so the output is in line with standard Nix. Then we could perhaps add a nixbuild.net setting that controls the number of log lines shown. Even better would be to print an URL with a link to the failing build log.
As you say, I think using -L
is safest for now so that you see all logs of all builds.
Okay, new issues!
https://github.com/input-output-hk/cardano-haskell-packages/actions/runs/4429490683/jobs/7770142361
building '/nix/store/l4348b1a1lb3qck2py53hw8fpij3pl3v-dummy-ghc.drv' on 'ssh://eu.nixbuild.net'...
building '/nix/store/6v3vk8rcv0cxps3mn79qsmjnw0malrgk-dummy-pkg-ghc-8.10.7.drv' on 'ssh://eu.nixbuild.net'...
copying 0 paths...
error: path '/nix/store/7z3l3r1dqzrlr4qmrfgpc8cxjld9q1db-cabal.project.drv' is not a valid store path
copying 0 paths...
error: path '/nix/store/xbsysxaxsbxyd9rh469bihsch3cvqrbg-dummy-ghc-8.10.7.drv' is not a valid store path
copying 0 paths...
error: path '/nix/store/0k0c51xmbj3lvqcqvgqg2dmlm86b2wgn-dummy-ghc.drv' is not a valid store path
copying 0 paths...
copying path '/nix/store/m557ji8jnlqzv00k6rry6npivr1mj2n2-nix-prefetch-git' from 'https://cache.zw3rk.com/'...
error: builder for '/nix/store/7z3l3r1dqzrlr4qmrfgpc8cxjld9q1db-cabal.project.drv' failed with exit code 1
error: builder for '/nix/store/0k0c51xmbj3lvqcqvgqg2dmlm86b2wgn-dummy-ghc.drv' failed with exit code 1
error: 1 dependencies of derivation '/nix/store/n3bznrdz3lp17axgywsvgpmax2bc1k4g-plutus-core-1.1.1.0-plan-to-nix-pkgs.drv' failed to build
I can run the exact same command line locally (i.e. building on nixbuild.net with remote store) and it works. I've got Nix 2.13.2 locally, and the GHA is using 2.13.3. I guess that final minor version might make a difference but not sure 🤔
I think this is related to IFD, but also to the Nix configuration on the GHA runner. You can see in the logs that it builds on ssh://eu.nixbuild.net
, but the remote store should be ssh-ng://eu.nixbuild.net
, right? The GHA runner (using nixbuild-action
) sets up ssh://eu.nixbuild.net
as a remote builder, and when you build your IFD-derivation it will use that remote builder during evaluation, and then switch to the remote store (ssh-ng://
) for realisation. I don't know exactly why it doesn't work, but I think Nix doesn't copy the .drv
-files to the remote store in this case.
To work around this, add --builders "" --max-jobs 2
to your Nix invocation. We are doing this in the CI workflow, where remote store building is used. I don't know why we haven't documented this properly.
Does that mean we won't get more than 2x build parallelism on nixbuild.net? That would be a shame!
It's also weird that it worked for me locally :thinking: I guess it's non-deterministic?
Ah, on my machine I don't have nixbuild.net
setup as a remote builder globally, I use it ad-hoc via --builders
. So I guess when I use the remote store I don't hit the weird case where it's also set up as a builder.
We are doing this in the CI workflow
Looks like that includes all the options I ended up having to set :D Probably worth documenting all of them, including --print-build-logs
? It would have saved me some time to just add those to the list of flags in the docs!
Does that mean we won't get more than 2x build parallelism on nixbuild.net? That would be a shame!
No, this is only for the builds that Nix must run during evaluation, because of IFD. I actually think there is no parallelism at all during evaulation, so --max-jobs 1
would do. But --max-jobs 0
would not work, and that is what the nixbuild-action
normally configures for the GHA runner.
So I guess when I use the remote store I don't hit the weird case where it's also set up as a builder.
Yes, this is correct. You would have hit the issue locally too if you had the remote builder setup there.
Looks like that includes all the options I ended up having to set :D Probably worth documenting all of them, including --print-build-logs? It would have saved me some time to just add those to the list of flags in the docs!
Yes, I will do that! Thank you for patience :)
Okay, I got some successful builds, hooray!
I also think it would be worth documenting the "don't know how to build these paths" thing. It's unusual so it looked very suspicious to me, it would be helpful to have something saying it was normal.
A new one, this one locally:
error: unimplemented worker op: WopQueryRealisation
I guess this is due to your implementation still being partial!
Yeah, do you know what you did to trigger this?
Running a nix build
command locally with remote store building. It builds all the derivations sccessfully, and then that's the final output.
@michaelpj Did you by any chance miss to provide the --eval-store auto
option?
Nope, that's definitely set.
Hmm, strange, this is not something I've seen before. What version of Nix are you using?
2.14.1. This doesn't seem to cause any problems, but does seem to happen every time.
Hmm, OK I'll try again to see if I can reproduce this. If you have some example build that this happens for it would be very welcome.
I'll get you one.
Try this:
nix build 'github:input-output-hk/cardano-haskell-packages#"ghc8107/word-array/0.1.0.0"' --eval-store auto --store ssh-ng://eu.nixbuild.net
@michaelpj It seems to start building ghc-8.6.5 locally. Are you using IFD that somehow involves ghc?
Yes for sure. Sorry, you probably need --accept-flake-config true
also, so that you get the caches!
(For future context: we're building a bunch of stuff with haskell.nix
, which does indeed run Haskell code at evaluation time in order to compute build plans. It works surprisingly well :D )
I still haven't been able to reproduce the error: unimplemented worker op: WopQueryRealisation
issue, even with your build. Have you seen it again?
Happens every time for me.
Thinking: I think "realisations" are to do with CA-derivations, and I do have experimental-features = ca-derivations
locally. Does setting that make any difference to whether you see it?
Yep, if I turn that off I don't get the error. I guess that points the finger fairly clearly, but probably that's not a high priority right now.
This is all trying to set up some CI for a repository. I happen to be expecting the build to fail on a dependency (I tried it locally), but I'm getting weird errors from Nix and not actually a failed derivation build with logs.
Unsure if this is a problem, haven't managed to get to the end so far due to the other issues.
Appears in e.g. https://github.com/input-output-hk/cardano-haskell-packages/actions/runs/4418243762/jobs/7745186405
Maybe a dependency failed to build? But no logs?
https://github.com/input-output-hk/cardano-haskell-packages/actions/runs/4417986503/jobs/7744547717
https://github.com/input-output-hk/cardano-haskell-packages/actions/runs/4418243762/jobs/7745186405
I seem to have progressed to just getting the last issue and haven't got past it.