hercules-ci / hercules-ci-agent

https://hercules-ci.com build and deployment agent
Apache License 2.0
99 stars 19 forks source link

binary cache 404 in dependency fetch causes loop #245

Closed roberth closed 4 years ago

roberth commented 4 years ago

Description

Log shows the same two dependencies being fetched over and over. Specifically, the subtree of a dependent of the missing path.

P -> A -> B

x -> y: x needs y P: what the agent is trying to build A: a path that is in a binary cache B: a path that is not in any binary cache

It will keep trying to fetch A and B.

~I'm investigating whether this could be related to broken C++ exception handling https://gitlab.haskell.org/ghc/ghc/-/issues/11829~

~Agent 0.6 may be unaffected but other fixes have not been backported there and it does not have live logs.~

To Reproduce

  1. have a missing path in the binary cache
  2. build something that depends on it, on a darwin agent

Expected behavior

Exception is caught, dependency derivation is built as fallback.

Logs

querying info about '/nix/store/fv3bh16qqbh78j1m11dgirlnilaizxvh-bimap-0.3.3' on 'https://cache.nixos.org'
downloading 'https://cache.nixos.org/fv3bh16qqbh78j1m11dgirlnilaizxvh.narinfo'
querying info about '/nix/store/fmd681801i64b5869b7rrsqyd1l76kc9-bimap-0.3.3-doc' on 'https://cache.nixos.org'
downloading 'https://cache.nixos.org/fmd681801i64b5869b7rrsqyd1l76kc9.narinfo'
querying info about '/nix/store/fmd681801i64b5869b7rrsqyd1l76kc9-bimap-0.3.3-doc' on 'some-private-cache'

Only fv3... exists, in the private cache.

Platform / Version

darwin, 0.7.4

roberth commented 4 years ago

This can be reproduced with Nix only, so it's probably a bug in Nix's goal state machine in build.cc, triggered by an incomplete output. So we have a derivation with two outputs, out and doc, where out will reference doc. They have been built, but doc has been removed from the cache or was never necessary to be uploaded (for example the cache is only used for binaries)

  1. Something needs out to be valid
  2. out narinfo is fetched. out depends on doc
  3. doc narinfo is fetched; is missing
  4. doc needs to be built. Build the drv
  5. drv may be substitutable, let's fetch out. go to 2.
jappeace commented 4 years ago

Did you report it in nix? We saw this behavior in our CI as well on agent version 0.7.3.

Also, how do we work around it?

roberth commented 4 years ago

Here's the Nix issue https://github.com/NixOS/nix/issues/3964. So far I've assumed it was a one-off, but that doesn't seem to be the case.

To work around the issue, you could build it manually without the "broken" cache.

nix-store -r --option substituters https://cache.nixos.org /nix/store/....drv

If you have a single agent per architecture, you could run it there and have the agent pick up the output when you click Rebuild. Alternatively, you could upload the outputs to the cache with cachix push (nix-copy-closure for those who don't use Cachix)

roberth commented 4 years ago

@jappeace Could you provide the derivation path? You can send it to support@hercules-ci.com if you prefer.

jappeace commented 4 years ago

I send an email :ok_hand:

roberth commented 4 years ago

Update: Domen has improved Cachix to better avoid this bug.

I'm closing this in favor of https://github.com/NixOS/nix/issues/3964 but feel free to comment or contact support if this recurs.