hercules-ci / support

User feedback, questions and our public roadmap. help@hercules-ci.com
5 stars 1 forks source link

Failure to fetch source from Github #29

Closed blitz closed 4 years ago

blitz commented 4 years ago

My build agent fails to fetch source code tarballs from Github:

ct 19 01:19:40 nixos hercules-ci-agent[677]: [2019-10-18 23:19:40][][Error][nixos][677][ThreadId 18][task:f7d1cc1e-53d2-4f58-86ce-8e6a5c055633][task-type:eval][agent-version:0.5.0][main:Hercules.Agent hercules-ci-agent/Hercules/Agent.hs:162:44] Exception in task: HttpExceptionRequest Request {
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   host                 = "codeload.github.com"
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   port                 = 443
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   secure               = True
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   requestHeaders       = []
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   path                 = "/blitz/obiwan/legacy.tar.gz/0d019c0e6b93527e5a5d668a9d81895545b36903"
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   queryString          = ""
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   method               = "GET"
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   proxy                = Nothing
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   rawBody              = False
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   redirectCount        = 10
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   responseTimeout      = ResponseTimeoutDefault
Oct 19 01:19:40 nixos hercules-ci-agent[677]:   requestVersion       = HTTP/1.1
Oct 19 01:19:40 nixos hercules-ci-agent[677]: }
Oct 19 01:19:40 nixos hercules-ci-agent[677]:  (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 6, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "codeload.github.com", service name: Just "443"): does not exist (Name or service not known))

This is weird, because fetching code via wget works just fine from the machine:

wget https://codeload.github.com/blitz/obiwan/legacy.tar.gz/0d019c0e6b93527e5a5d668a9d81895545b36903
--2019-10-19 01:26:05--  https://codeload.github.com/blitz/obiwan/legacy.tar.gz/0d019c0e6b93527e5a5d668a9d81895545b36903
Resolving codeload.github.com (codeload.github.com)... 140.82.113.9
Connecting to codeload.github.com (codeload.github.com)|140.82.113.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘0d019c0e6b93527e5a5d668a9d81895545b36903’

0d019c0e6b93527e5a5d668a9d81895545b36903                        [ <=>                                                                                                                                       ]  21.69K  --.-KB/s    in 0.1s    

2019-10-19 01:26:05 (219 KB/s) - ‘0d019c0e6b93527e5a5d668a9d81895545b36903’ saved [22215]

It looks a bit like the agent wants to resolve codeload.github.com as IPv6 and that fails, because that domain doesn't have an AAAA record.

domenkozar commented 4 years ago

That's getAddrInfo failing from network package, trying to resolve DNS.

It's strange to me that the system is preferring ipv6, what OS/distribution are you running the agent on?

blitz commented 4 years ago

I'm running NixOS 19.09. If it helps, I can also give you access to this box.

domenkozar commented 4 years ago

That would be great.

My ssh pub key: https://static.domenkozar.com/ielectric.pub

Send me an email with access details at domen@hercules-ci.com

domenkozar commented 4 years ago

Thanks.

I'm trying to replicate the same call but I get:

Prelude Network.Socket> defaultHints { addrFlags = [AI_ADDRCONFIG], addrSocketType = Stream }
AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = *** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
  undefined, called at Network/Socket.hsc:1628:40 in network-2.8.0.1-Hmt657UE3v349uYmvUXEvW:Network.Socket
domenkozar commented 4 years ago

Seems like a bug in Show instance.

Prelude Network.Socket> foo = defaultHints { addrFlags = [AI_ADDRCONFIG], addrSocketType = Stream }
Prelude Network.Socket> getAddrInfo (Just foo) (Just "codeload.github.com") (Just "443")
[AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 140.82.114.10:443, addrCanonName = Nothing}]

I believe everything is working now, so it looks like a racing condition between network setup and agent start.

I can see that network got configured 6 seconds after the agent failed: Oct 19 01:09:38 nixos systemd[1]: Reached target Network is Online.

domenkozar commented 4 years ago

@blitz should be fixed in next hercules-ci-agent release, I've restarted your agent so it should work now.

blitz commented 4 years ago

Thanks for looking into this! I definitely got further now, but it just fails later, when it actually starts building:

warning: unknown setting 'sandbox-fallback'
these paths will be fetched (32.06 MiB download, 32.06 MiB unpacked):
  /nix/store/9ybz7r3i3i2cy6f3h6sm0psa2kzqdhz3-bootstrap-tools.tar.xz
warning: you did not specify '--add-root'; the result might be removed by the garbage collector

A technical error occurred: ConnectionError "HttpExceptionRequest Request {\n  host                 = \"blitz.cachix.org\"\n  port                 = 443\n  secure               = True\n  requestHeaders       = [(\"Content-Type\",\"application/x-nix-nar\")]\n  path                 = \"/api/v1/cache/blitz/nar\"\n  queryString          = \"\"\n  method               = \"POST\"\n  proxy                = Nothing\n  rawBody              = False\n  redirectCount        = 10\n  responseTimeout      = ResponseTimeoutDefault\n  requestVersion       = HTTP/1.1\n}\n (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 6, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just \"blitz.cachix.org\", service name: Just \"443\"): does not exist (Name or service not known))"
domenkozar commented 4 years ago

That looks like your DNS servers are behaving strange, I suggest you try using google 8.8.8.8 or cloudflare 1.1.1.1 to see if that fixes it.

blitz commented 4 years ago

Changing DNS servers doesn't make a difference. Also DNS works fine from nslookup, ping, wget. Super weird.

% wget https://blitz.cachix.org/
--2019-10-20 14:49:47--  https://blitz.cachix.org/
Resolving blitz.cachix.org (blitz.cachix.org)... 34.205.214.246
Connecting to blitz.cachix.org (blitz.cachix.org)|34.205.214.246|:443... connected.
HTTP request sent, awaiting response... 200 OK