hercules-ci / support

User feedback, questions and our public roadmap. help@hercules-ci.com
5 stars 1 forks source link

Agent Failure on Darwin #47

Closed brendanhay closed 3 years ago

brendanhay commented 3 years ago

Fresh install of nix + hercules-ci-agent on OSX Catalina - everything nix-related seems to work as expected, the dashboard shows the agent connected and evaluation correctly triggers the Darwin related builds. However when a task is started by the agent on the Darwin host the logs show (for a variety of different derivations):

[2020-09-24 07:13:30][][Warning][agent02][PID 666][ThreadId 7][hard:infinity][resource:open files][message:setResourceLimit: invalid argument (Invalid argument)][agent-version:0.7.4][soft:65536][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:263:9] Could not increase resource limit [2020-09-24 07:13:30][][Info][agent02][PID 666][ThreadId 26][agent-version:0.7.4][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:114:11] Agent online. [2020-09-24 07:13:59][][Info][agent02][PID 666][ThreadId 36][task:d2496dda-af04-4f9f-8618-cf7dd14a00ba][agent-version:0.7.4][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:186:7] Starting task [2020-09-24 07:14:00][][Error][agent02][PID 666][ThreadId 36][exception:WorkerException {originalException = FatalError {fatalErrorMessage = "Could not retrieve derivation \"/nix/store/05lrdwy084n5qbdva84yx4q0hfqr724b-pcre-8.44.tar.bz2.drv\" from local store or binary caches."}, exitStatus = Just (ExitFailure 1)}][task:d2496dda-af04-4f9f-8618-cf7dd14a00ba][message:FatalError {fatalErrorMessage = "Could not retrieve derivation \"/nix/store/05lrdwy084n5qbdva84yx4q0hfqr724b-pcre-8.44.tar.bz2.drv\" from local store or binary caches."} (worker: ExitFailure 1)][agent-version:0.7.4][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:175:11] Exception in task

DNS resolution, networking, and so on all seem groovy for the host - it's only the agent that's experiencing an error. I've previously seen something similar on a different host when Nix had SSL issues - although the NIX_SSL_CERT_FILE in the .plist seems legit, in this instance.

How do?

roberth commented 3 years ago

Hi Brendan,

Your agents are set up to push to different caches, urbit-linux and urbit-osx, but the agent relies on a shared binary cache to share the derivations. Evaluation always runs on your x86_64-linux agents, so it should suffice to configure your machines to read from the other caches. You could either unify your caches or make macOS read from the linux cache. Assuming the latter, if you're using nix-darwin, you could add

  # add to macOS
  nix = {
    binaryCaches = [
      # To retrieve derivations from evaluation on linux
      "https://urbit-linux.cachix.org"
    ];
    binaryCachePublicKeys = [
      "urbit-linux.cachix.org-1:fMTIhlbzjbs6rpwwEwKjeLtZMo2KOXHE6j7k0ilpi0E="
    ];
  };

For completeness you could add this to your NixOS agent machine. It is useful when you want to use outputs from your macOS builds in a linux derivation or linux Effect for continuous deployment. (available soon)

  # add to NixOS
  nix = {
    binaryCaches = [
      # To retrieve build outputs from macOS
      "https://urbit-osx.cachix.org"
    ];
    binaryCachePublicKeys = [
      "urbit-osx.cachix.org-1:elrx4waGWOMKWg//0GFRhILvMeplTg9ryQpf38R4o/Y="
    ];
  };
brendanhay commented 3 years ago

Great, feels like a minor brain fart on my part - of course it works like that, thanks!