Closed ChillarAnand closed 3 years ago
Hmmm, this is obviously a networking or DNS problem. Does it happen intermittently or every time?
It is happening every time. I deleted builder pod and the same issue is happening in the new pod as well.
➜ git push deis ENGG-3881
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_ALL', Value='en_US.UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder Key='LC_CTYPE', Value='UTF-8'
deis deis-builder-57cf7db484-64x99 deis-builder
deis deis-builder-57cf7db484-64x99 deis-builder receiving git repo name: demo-server.git, operation: git-receive-pack, fingerprint: ee:02:70:18:75:c4:23:6c:38:d6:11:13:81:4e:6a:c8, user: test
deis deis-builder-57cf7db484-64x99 deis-builder creating repo directory /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder writing pre-receive hook under /home/git/demo-server.git
deis deis-builder-57cf7db484-64x99 deis-builder git-shell -c git-receive-pack 'demo-server.git'
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for git-receive to run.
deis deis-builder-57cf7db484-64x99 deis-builder Waiting for deploy.
deis deis-builder-57cf7db484-64x99 deis-builder Deploy complete.
- deis deis-builder-57cf7db484-64x99
+ deis deis-builder-57cf7db484-hfs4v › deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting health check server on port 8092
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting deleted app cleaner
deis deis-builder-57cf7db484-hfs4v deis-builder 2020/04/18 03:25:02 Starting SSH server on 0.0.0.0:2223
deis deis-builder-57cf7db484-hfs4v deis-builder Listening on 0.0.0.0:2223
deis deis-builder-57cf7db484-hfs4v deis-builder Accepting new connections.
deis deis-builder-57cf7db484-hfs4v deis-builder Accepted connection.
deis deis-builder-57cf7db484-hfs4v deis-builder Starting ssh authentication
deis deis-builder-57cf7db484-hfs4v deis-builder Channel type: session
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LANG', Value='C.UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LC_ALL', Value='en_US.UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder Key='LC_CTYPE', Value='UTF-8'
deis deis-builder-57cf7db484-hfs4v deis-builder
deis deis-builder-57cf7db484-hfs4v deis-builder receiving git repo name: demo-server.git, operation: git-receive-pack, fingerprint: ee:02:70:18:75:c4:23:6c:38:d6:11:13:81:4e:6a:c8, user: test
deis deis-builder-57cf7db484-hfs4v deis-builder creating repo directory /home/git/demo-server.git
deis deis-builder-57cf7db484-hfs4v deis-builder writing pre-receive hook under /home/git/demo-server.git
deis deis-builder-57cf7db484-hfs4v deis-builder git-shell -c git-receive-pack 'demo-server.git'
deis deis-builder-57cf7db484-hfs4v deis-builder Waiting for git-receive to run.
deis deis-builder-57cf7db484-hfs4v deis-builder Waiting for deploy.
deis deis-builder-57cf7db484-hfs4v deis-builder Deploy complete.
---> Using cache
---> 5e8494e15701
Step 4/19 : RUN apt-get update && apt-get install -y apt-transport-https ca-certificates vim && curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - && echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list && rm -rf /var/lib/apt/lists/*
---> Running in a37b160bf44f
Err:1 http://deb.debian.org/debian buster InRelease
Temporary failure resolving 'deb.debian.org'
Err:2 http://security.debian.org/debian-security buster/updates InRelease
Temporary failure resolving 'security.debian.org'
Err:3 http://deb.debian.org/debian buster-updates InRelease
Temporary failure resolving 'deb.debian.org'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease Temporary failure resolving 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package vim
The command '/bin/sh -c apt-get update && apt-get install -y apt-transport-https ca-certificates vim && curl -sS https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - && echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
Is Kubernetes responsible for DNS resolution or the builder itself?
Look like there is outside internet connectivity from inside the builder pod, but you may be dealing with issues related to no connectivity from the dockebuilder pod. Can you add a curl command in the Dockefile to curl to some website.
Try:
RUN curl -4 icanhazip.com
Something may be wrong with the way kudedns or cni is configured.
Starting build... but first, coffee!
Step 1/11 : FROM python:2.7
---> 68e7be49c28c
Step 2/11 : RUN curl -4 icanhazip.com
---> Running in 7fdacdf60383
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
remote: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: icanhazip.com
The command '/bin/sh -c curl -4 icanhazip.com' returned a non-zero code: 6
remote: 2020/04/21 16:34:30 Error running git receive hook [Build pod exited with code 1, stopping build.]
This also seems to be failing. Any thoughts on how to troubleshoot it?
This seems to happen only with docker builds.
I am able to run this app on the cluster https://github.com/teamhephy/example-python-django. This is using only Procfile without any dockerfile.
However this app https://github.com/teamhephy/helloworld with Dockerfile is failing to build.
Starting build... but first, coffee!
Step 1/10 : FROM debian:jessie
---> 7144b35bf6b5
Step 2/10 : RUN apt-get update && apt-get install -qy curl
---> Running in 34477c1baa1d
Err http://deb.debian.org jessie InRelease
Err http://security.debian.org jessie/updates InRelease
Err http://deb.debian.org jessie-updates InRelease
mote:
Err http://security.debian.org jessie/updates Release.gpg
Could not resolve 'security.debian.org'
Err http://deb.debian.org jessie Release.gpg
Could not resolve 'deb.debian.org'
Err http://deb.debian.org jessie-updates Release.gpg
Could not resolve 'deb.debian.org'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/jessie/InRelease
W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease
W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/InRelease
W: Failed to fetch http://deb.debian.org/debian/dists/jessie/Release.gpg Could not resolve 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/Release.gpg Could not resolve 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/Release.gpg Could not resolve 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package curl
The command '/bin/sh -c apt-get update && apt-get install -qy curl' returned a non-zero code: 100
remote: 2020/04/21 16:41:30 Error running git receive hook [Build pod exited with code 1, stopping build.]
This is likely an issue with the CNI and docker interface is not able to access outside interface for whatever reason. Could be iptables or firewall...
How are you running kubernetes? What is the infrastructure underneath kubernetes, what cloud provider and what networking CNI are you using?
AWS EKS - 1 master + 1 worker node(m5.large) - Managed using eksctl.
eksctl create cluster -n demo --version 1.15 --nodes 1 --node-type m5.large
It is using default amazon-k8s-cni.
➜ ~ kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2
amazon-k8s-cni:v1.5.5
So this looks all good. I think your problem is security group or iptable rules on the nodes not allowing you to send request out to 0.0.0.0 . Can you verify that the security group of the EKS nodes has a rule to ALLOW ALL 0.0.0.0 for Outbound.
Thanks, @Cryptophobia
There is some issue with quay.io images and hephy installation is failing. Once it is resolved, will check this.
Outbound rules on all nodes seem to be set correctly.
Also, if the outbound rule was the problem, shouldn't pip install fail when setting up https://github.com/teamhephy/example-python-django?
Is there any debug flag that can be set to see more verbose output?
Yes @ChillarAnand , this is correct. If pip install
does not fail inside the builder that means that this is a particular problem with dockerbuilder pod.
The problem here is that the builder when doing heroku buildpacks runs networking inside it's own container, while when building Dockerfiles with deis push
command, the builder first spawns a separate pod dockerbuilder
to build the docker image. There must be something related to networking that is broken on this dockerbuilder
pod when spawned by builder
...
Is there any debug flag that can be set to see more verbose output?
Can you enable logging on builder by setting the DEBUG env variable on the builder? https://docs.teamhephy.com/managing-workflow/tuning-component-settings/#customizing-the-builder
$ kubectl --namespace deis edit deployment deis-builder
After setting DEIS_DEBUG
flag to true, re-deployed helloworld.
It printed out pod spec and failed at apt update
as mentioned earlier. Couldn't find anything useful.
@ChillarAnand just out of curiosity, if you're using CNI did you also enable this value at Workflow install time:
--set global.use_cni=true
I don't really understand how CNI affects the topology of the cluster but this has resolved networking issues on some cluster providers for me before. It's one of the highlights on https://web.teamhephy.com/ (see instructions for DigitalOcean at the bottom)
Totally forgot about this. Yes, thank you @kingdonb! Might want to try that global flag when installing/upgrading hephy workflow as well. Now that I think about it, the global.use_cni=true
flag may solve this issue.
More info about this flag: https://docs.teamhephy.com/managing-workflow/production-deployments/#using-on-cluster-registry-with-cni
Thanks, @kingdonb
In a new cluster, i have installed hephy with the following command and still the build is failing with the same error.
helm install hephy/workflow --namespace deis --generate-name \
--set router.host_port.enabled=true --set global.use_rbac=true --set global.use_cni=true
Not sure what is causing the issue.
Hi @ChillarAnand , I believe you figured this out in our Slack community so I am going to close it for now.
After git push, during docker build, build fails to resolve deb.debian.org
There are no errors in builder pod logs
If I ssh into pod and try to resolve it, it is working.