teamhephy / builder

MIT License
3 stars 12 forks source link

Builder fails to resolve host #63

Closed ChillarAnand closed 3 years ago

ChillarAnand commented 4 years ago

After git push, during docker build, build fails to resolve deb.debian.org

 ---> Using cache
 ---> 5e8494e15701
Step 4/19 : RUN apt-get update &&     apt-get install -y apt-transport-https ca-certificates vim &&     curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - &&     echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list &&     rm -rf /var/lib/apt/lists/*
 ---> Running in cad63ad42fd8
Err:1 http://deb.debian.org/debian buster InRelease
  Temporary failure resolving 'deb.debian.org'
Err:2 http://security.debian.org/debian-security buster/updates InRelease
  Temporary failure resolving 'security.debian.org'
Err:3 http://deb.debian.org/debian buster-updates InRelease
  Temporary failure resolving 'deb.debian.org'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease  Temporary failure resolving 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease  Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package vim
The command '/bin/sh -c apt-get update &&     apt-get install -y apt-transport-https ca-certificates vim &&     curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - &&     echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
remote: 2020/04/17 13:42:22 Error running git receive hook [Build pod exited with code 1, stopping build.]
To ssh://deis-builder.x.x.x.x.nip.io:2222/demo-server.git
 ! [remote rejected]   ENGG-3881 -> ENGG-3881 (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis-builder.x.x.x.x.nip.io:2222/demo-server.git'

There are no errors in builder pod logs


deis deis-builder-57cf7db484-64x99 deis-builder Accepted connection.
deis deis-builder-57cf7db484-64x99 deis-builder Starting ssh authentication
deis deis-builder-57cf7db484-64x99 deis-builder Channel type: session
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LANG', Value='C.UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_ALL', Value='en_US.UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_CTYPE', Value='UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder receiving git repo name: demo-server.git, operation: git-receive-pack, fingerprint: ee:02:70:18:75:c4:23:6c:38:d6:11:13:81:4e:6a:c8, user: test
deis deis-builder-57cf7db484-64x99 deis-builder creating repo directory /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder writing pre-receive hook under /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder git-shell -c git-receive-pack 'demo-server.git'
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for git-receive to run.
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for deploy.
deis deis-builder-57cf7db484-64x99 deis-builder Deploy complete.

If I ssh into pod and try to resolve it, it is working.

root@deis-builder-57cf7db484-64x99:/# host deb.debian.org
deb.debian.org is an alias for debian.map.fastly.net.
debian.map.fastly.net has address 151.101.158.133
debian.map.fastly.net has IPv6 address 2a04:4e42:24::645
Cryptophobia commented 4 years ago

Hmmm, this is obviously a networking or DNS problem. Does it happen intermittently or every time?

ChillarAnand commented 4 years ago

It is happening every time. I deleted builder pod and the same issue is happening in the new pod as well.

➜  git push deis ENGG-3881
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_ALL', Value='en_US.UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_CTYPE', Value='UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder receiving git repo name: demo-server.git, operation: git-receive-pack, fingerprint: ee:02:70:18:75:c4:23:6c:38:d6:11:13:81:4e:6a:c8, user: test
deis deis-builder-57cf7db484-64x99 deis-builder creating repo directory /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder writing pre-receive hook under /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder git-shell -c git-receive-pack 'demo-server.git'
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for git-receive to run.
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for deploy.
deis deis-builder-57cf7db484-64x99 deis-builder Deploy complete.
- deis deis-builder-57cf7db484-64x99
+ deis deis-builder-57cf7db484-hfs4v › deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting health check server on port 8092
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting deleted app cleaner
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting SSH server on 0.0.0.0:2223
deis deis-builder-57cf7db484-hfs4v deis-builder Listening on 0.0.0.0:2223
deis deis-builder-57cf7db484-hfs4v deis-builder Accepting new connections.

deis deis-builder-57cf7db484-hfs4v deis-builder Accepted connection.
deis deis-builder-57cf7db484-hfs4v deis-builder Starting ssh authentication
deis deis-builder-57cf7db484-hfs4v deis-builder Channel type: session
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LANG', Value='C.UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LC_ALL', Value='en_US.UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LC_CTYPE', Value='UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder receiving git repo name: demo-server.git, operation: git-receive-pack, fingerprint: ee:02:70:18:75:c4:23:6c:38:d6:11:13:81:4e:6a:c8, user: test
deis deis-builder-57cf7db484-hfs4v deis-builder creating repo directory /home/git/demo-server.git
deis deis-builder-57cf7db484-hfs4v deis-builder writing pre-receive hook under /home/git/demo-server.git
deis deis-builder-57cf7db484-hfs4v deis-builder git-shell -c git-receive-pack 'demo-server.git'
deis deis-builder-57cf7db484-hfs4v deis-builder Waiting for git-receive to run.
deis deis-builder-57cf7db484-hfs4v deis-builder Waiting for deploy.
deis deis-builder-57cf7db484-hfs4v deis-builder Deploy complete.
---> Using cache
 ---> 5e8494e15701
Step 4/19 : RUN apt-get update &&     apt-get install -y apt-transport-https ca-certificates vim &&     curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - &&     echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list &&     rm -rf /var/lib/apt/lists/*
 ---> Running in a37b160bf44f
Err:1 http://deb.debian.org/debian buster InRelease
  Temporary failure resolving 'deb.debian.org'
Err:2 http://security.debian.org/debian-security buster/updates InRelease
  Temporary failure resolving 'security.debian.org'
Err:3 http://deb.debian.org/debian buster-updates InRelease
  Temporary failure resolving 'deb.debian.org'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease  Temporary failure resolving 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease  Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package vim
The command '/bin/sh -c apt-get update &&     apt-get install -y apt-transport-https ca-certificates vim &&     curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - &&     echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

Is Kubernetes responsible for DNS resolution or the builder itself?

Cryptophobia commented 4 years ago

Look like there is outside internet connectivity from inside the builder pod, but you may be dealing with issues related to no connectivity from the dockebuilder pod. Can you add a curl command in the Dockefile to curl to some website.

Try:

RUN curl -4 icanhazip.com

Something may be wrong with the way kudedns or cni is configured.

ChillarAnand commented 4 years ago
Starting build... but first, coffee!
Step 1/11 : FROM python:2.7
 ---> 68e7be49c28c
Step 2/11 : RUN curl -4 icanhazip.com
 ---> Running in 7fdacdf60383
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
remote:   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: icanhazip.com
The command '/bin/sh -c curl -4 icanhazip.com' returned a non-zero code: 6
remote: 2020/04/21 16:34:30 Error running git receive hook [Build pod exited with code 1, stopping build.]

This also seems to be failing. Any thoughts on how to troubleshoot it?

ChillarAnand commented 4 years ago

This seems to happen only with docker builds.

I am able to run this app on the cluster https://github.com/teamhephy/example-python-django. This is using only Procfile without any dockerfile.

However this app https://github.com/teamhephy/helloworld with Dockerfile is failing to build.

Starting build... but first, coffee!
Step 1/10 : FROM debian:jessie
 ---> 7144b35bf6b5
Step 2/10 : RUN apt-get update && apt-get install -qy curl
 ---> Running in 34477c1baa1d
Err http://deb.debian.org jessie InRelease

Err http://security.debian.org jessie/updates InRelease

Err http://deb.debian.org jessie-updates InRelease
  mote: 
Err http://security.debian.org jessie/updates Release.gpg
  Could not resolve 'security.debian.org'
Err http://deb.debian.org jessie Release.gpg
  Could not resolve 'deb.debian.org'
Err http://deb.debian.org jessie-updates Release.gpg
  Could not resolve 'deb.debian.org'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/jessie/InRelease  

W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease  

W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/InRelease  

W: Failed to fetch http://deb.debian.org/debian/dists/jessie/Release.gpg  Could not resolve 'deb.debian.org'

W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/Release.gpg  Could not resolve 'security.debian.org'

W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/Release.gpg  Could not resolve 'deb.debian.org'

W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package curl
The command '/bin/sh -c apt-get update && apt-get install -qy curl' returned a non-zero code: 100
remote: 2020/04/21 16:41:30 Error running git receive hook [Build pod exited with code 1, stopping build.]
Cryptophobia commented 4 years ago

This is likely an issue with the CNI and docker interface is not able to access outside interface for whatever reason. Could be iptables or firewall...

How are you running kubernetes? What is the infrastructure underneath kubernetes, what cloud provider and what networking CNI are you using?

ChillarAnand commented 4 years ago

AWS EKS - 1 master + 1 worker node(m5.large) - Managed using eksctl.

eksctl create cluster -n demo --version 1.15 --nodes 1 --node-type m5.large 

It is using default amazon-k8s-cni.

➜  ~ kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2

amazon-k8s-cni:v1.5.5
Cryptophobia commented 4 years ago

So this looks all good. I think your problem is security group or iptable rules on the nodes not allowing you to send request out to 0.0.0.0 . Can you verify that the security group of the EKS nodes has a rule to ALLOW ALL 0.0.0.0 for Outbound.

ChillarAnand commented 4 years ago

Thanks, @Cryptophobia

There is some issue with quay.io images and hephy installation is failing. Once it is resolved, will check this.

ChillarAnand commented 4 years ago
Screenshot 2020-04-24 at 9 24 53 PM

Outbound rules on all nodes seem to be set correctly.

Also, if the outbound rule was the problem, shouldn't pip install fail when setting up https://github.com/teamhephy/example-python-django?

Is there any debug flag that can be set to see more verbose output?

Cryptophobia commented 4 years ago

Yes @ChillarAnand , this is correct. If pip install does not fail inside the builder that means that this is a particular problem with dockerbuilder pod.

The problem here is that the builder when doing heroku buildpacks runs networking inside it's own container, while when building Dockerfiles with deis push command, the builder first spawns a separate pod dockerbuilder to build the docker image. There must be something related to networking that is broken on this dockerbuilder pod when spawned by builder...

Is there any debug flag that can be set to see more verbose output?

Can you enable logging on builder by setting the DEBUG env variable on the builder? https://docs.teamhephy.com/managing-workflow/tuning-component-settings/#customizing-the-builder

ChillarAnand commented 4 years ago
$ kubectl --namespace deis edit deployment deis-builder 

After setting DEIS_DEBUG flag to true, re-deployed helloworld.

It printed out pod spec and failed at apt update as mentioned earlier. Couldn't find anything useful.

kingdonb commented 4 years ago

@ChillarAnand just out of curiosity, if you're using CNI did you also enable this value at Workflow install time:

--set global.use_cni=true

I don't really understand how CNI affects the topology of the cluster but this has resolved networking issues on some cluster providers for me before. It's one of the highlights on https://web.teamhephy.com/ (see instructions for DigitalOcean at the bottom)

Cryptophobia commented 4 years ago

Totally forgot about this. Yes, thank you @kingdonb! Might want to try that global flag when installing/upgrading hephy workflow as well. Now that I think about it, the global.use_cni=true flag may solve this issue.

More info about this flag: https://docs.teamhephy.com/managing-workflow/production-deployments/#using-on-cluster-registry-with-cni

ChillarAnand commented 4 years ago

Thanks, @kingdonb

In a new cluster, i have installed hephy with the following command and still the build is failing with the same error.

helm install hephy/workflow --namespace deis --generate-name \
     --set router.host_port.enabled=true --set global.use_rbac=true --set global.use_cni=true

Not sure what is causing the issue.

Cryptophobia commented 3 years ago

Hi @ChillarAnand , I believe you figured this out in our Slack community so I am going to close it for now.