nix-community / docker-nix

Docker image for nix [maintainer=@zimbatm] [status=deprecated]
https://hub.docker.com/r/nixorg/nix/
Apache License 2.0
38 stars 9 forks source link

`nixorg/nix:circleci` doesn't work on CircleCI #9

Closed TerrorJack closed 6 years ago

TerrorJack commented 6 years ago

When using nixorg/nix:circleci as the build image for a Haskell project I observed two errors:

  1. Failing to query the cache

The "Restoring Cache" step yields the following error:

Skipping cache - error checking storage: RequestError: send request failed
caused by: Get https://circle-production-customer-artifacts.s3.amazonaws.com/?list-type=2&prefix=picard%2F59cfd9bf280ab90776b7e583%2Fglobal%2Fcaches%2Fstack-work-gen4-: x509: failed to load system roots and no roots provided
  1. stack fails to fetch resolvers:
#!/bin/bash -eo pipefail
stack --no-terminal build --haddock --test --no-run-tests

Downloading lts-11.3 build plan ...

Warning: Retry number 0 after a total
         delay of 0 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510

Warning: Retry number 1 after a total
         delay of 100000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510

Warning: Retry number 2 after a total
         delay of 200000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510

Warning: Retry number 3 after a total
         delay of 300000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
HttpExceptionRequest Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-11.3.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.BSD.getProtocolByName: does not exist (no such protocol name: tcp))
Exited with code 1

For the first error, a workaround is ln -s $NIX_SSL_CERT_FILE /etc/ssl/certs/ca-certificates.crt. I still can't fix the second error though, and I'm not sure if it's also a certificate issue.

zimbatm commented 6 years ago

Okay I fixed the first issue in e2db1d06ee1074fea0cec261e1d907d3beffeebd

The last issue is a Haskell-specific issue. For some reason Haskell depends on /etc/services to exist if I remember correctly.

TerrorJack commented 6 years ago

@zimbatm Is it possible to also pack /etc/services into the final image?

zimbatm commented 6 years ago

~see https://stackoverflow.com/questions/47244229/haskell-hostname-resolution-inside-alpine-docker-image-does-not-work~

sure!

zimbatm commented 6 years ago

@TerrorJack can you retry and confirm that this has been fixed?

TerrorJack commented 6 years ago

@zimbatm Unfortunately not.

The current error message:

HttpExceptionRequest Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-11.3.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (InternalException (HandshakeFailed (Error_Protocol ("certificate has unknown CA",True,UnknownCa))))
Exited with code 1

And it's not resolved even if I run nix-env -iA nixpkgs.cacert.

Also, the /etc/ssl/certs/ca-certificates.crt trick is probably not helping, I'll re-investigate why it worked on several previous commits but failed again.

zimbatm commented 6 years ago

did you try setting SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt as well?

TerrorJack commented 6 years ago

@zimbatm Setting SSL_CERT_FILE doesn't help.

zimbatm commented 6 years ago

can you strace the process to find out where it's looking for the certs?

Another possibility is that the image has been cached

TerrorJack commented 6 years ago

@zimbatm Here's the dump:

strace -e trace=open stack --no-terminal build --haddock --test --no-run-tests00:00
Exit code: 1

#!/bin/bash -eo pipefail
strace -e trace=open stack --no-terminal build --haddock --test --no-run-tests

open("/proc/self/task/128/comm", O_RDWR) = 4
open("/proc/self/task/129/comm", O_RDWR) = 8
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 11
Downloading lts-11.3 build plan ...
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 11
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Warning: Retry number 0 after a total
         delay of 0 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 1 after a total
         delay of 100000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 2 after a total
         delay of 200000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 3 after a total
         delay of 300000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
HttpExceptionRequest Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-11.3.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (InternalException (HandshakeFailed (Error_Protocol ("certificate has unknown CA",True,UnknownCa))))
+++ exited with 1 +++
Exited with code 1
zimbatm commented 6 years ago

thanks. Can you add -fF to follow threads and sub-processes?

TerrorJack commented 6 years ago

@zimbatm Updated trace:

#!/bin/bash -eo pipefail
strace -e trace=open -f stack --no-terminal build --haddock --test --no-run-tests

strace: Process 126 attached
[pid   125] open("/proc/self/task/126/comm", O_RDWR) = 3
strace: Process 127 attached
[pid   125] open("/proc/self/task/127/comm", O_RDWR) = 8
strace: Process 128 attached
[pid   125] open("/proc/self/task/128/comm", O_RDWR) = 11
[pid   125] open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 11
strace: Process 129 attached
[pid   125] open("/proc/self/task/129/comm", O_RDWR) = 11
Downloading lts-11.3 build plan ...
[pid   125] open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   125] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11
[pid   125] open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   125] open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 11
[pid   125] open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   127] open("/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   127] open("/etc/ssl/certs/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 12

Warning: Retry number 0 after a total
         delay of 0 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   125] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11
strace: Process 130 attached
[pid   129] open("/proc/self/task/130/comm", O_RDWR) = 12

Warning: Retry number 1 after a total
         delay of 100000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   125] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 2 after a total
         delay of 200000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   125] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 3 after a total
         delay of 300000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
HttpExceptionRequest Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-11.3.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (InternalException (HandshakeFailed (Error_Protocol ("certificate has unknown CA",True,UnknownCa))))
[pid   129] +++ exited with 0 +++
[pid   130] +++ exited with 0 +++
[pid   128] +++ exited with 0 +++
[pid   127] +++ exited with 0 +++
[pid   126] +++ exited with 0 +++
+++ exited with 1 +++
Exited with code 1
zimbatm commented 6 years ago

aha, pid 127 opens /etc/ssl/certs/ but for some reason doesn't try anything else. Can you trace with execve to find out what process it is.

Also before that add a ls /etc/ssl/certs/. On my system I have:

$ ls -l /etc/ssl/certs/
total 0
lrwxrwxrwx 1 root root 35 Apr  5 20:11 ca-bundle.crt -> /etc/static/ssl/certs/ca-bundle.crt
lrwxrwxrwx 1 root root 41 Apr  5 20:11 ca-certificates.crt -> /etc/static/ssl/certs/ca-certificates.crt
$ readlink -f /etc/ssl/certs/ca-bundle.crt 
/nix/store/0qjswnffk75dk2nmvjkfnr2fxq2774av-ca-certificates.crt
$ readlink -f /etc/ssl/certs/ca-certificates.crt 
/nix/store/0qjswnffk75dk2nmvjkfnr2fxq2774av-ca-certificates.crt
TerrorJack commented 6 years ago

@zimbatm Updated trace:

#!/bin/bash -eo pipefail
ls /etc/ssl/certs/
strace -e trace=open,execve -f stack --no-terminal build --haddock --test --no-run-tests

execve("/nix/var/nix/profiles/default/bin/stack", ["stack", "--no-terminal", "build", "--haddock", "--test", "--no-run-tests"], 0x7ffdcaa5beb0 /* 36 vars */) = 0
strace: Process 127 attached
[pid   126] open("/proc/self/task/127/comm", O_RDWR) = 3
strace: Process 128 attached
[pid   126] open("/proc/self/task/128/comm", O_RDWR) = 8
strace: Process 129 attached
[pid   126] open("/proc/self/task/129/comm", O_RDWR) = 11
[pid   126] open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 11
strace: Process 130 attached
[pid   126] open("/proc/self/task/130/comm", O_RDWR) = 11
Downloading lts-11.3 build plan ...
[pid   126] open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   126] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11
[pid   126] open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   126] open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 11
[pid   126] open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   128] open("/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   128] open("/etc/ssl/certs/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 12

Warning: Retry number 0 after a total
         delay of 0 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   126] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 1 after a total
         delay of 100000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   126] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 2 after a total
         delay of 200000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
[pid   126] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 11

Warning: Retry number 3 after a total
         delay of 300000 us
         If you see this warning and
         stack fails to download, but
         running the command again
         solves the problem, please
         report here:
         https://github.com/commercialhaskell/stack/issues/3510
HttpExceptionRequest Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-11.3.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (InternalException (HandshakeFailed (Error_Protocol ("certificate has unknown CA",True,UnknownCa))))
[pid   130] +++ exited with 0 +++
[pid   129] +++ exited with 0 +++
[pid   128] +++ exited with 0 +++
[pid   127] +++ exited with 0 +++
+++ exited with 1 +++
Exited with code 1
zimbatm commented 6 years ago

Can you try the nixorg/nix:circleci-debug image once it has finished building at https://hub.docker.com/r/nixorg/nix/builds/

thedavidmeister commented 6 years ago

i'm getting this:

Skipping cache - error checking storage: RequestError: send request failed
caused by: Get https://circle-production-customer-artifacts.s3.amazonaws.com/?list-type=2&prefix=picard%2F5af926b4fccf9600125bd9e8%2Fglobal%2Fcaches%2Fv1-dependencies-OAIgikM4JSk2K2cUm21659rnoaBaSGUs0gJWKkZWw_U%3D: x509: failed to load system roots and no roots provided

with the nixorg/nix:circleci-debug image

domenkozar commented 6 years ago

following fixed http-client CA lookup for me:

+      - run:
+          name: fix nix image tls
+          command: |
+            mkdir -p /etc/ssl/certs
+            ln -s $NIX_SSL_CERT_FILE /etc/ssl/certs/ca-certificates.crt
zimbatm commented 6 years ago

If you could try the latest nixorg/nix:circleci, this bug should now be fixed