Closed vinzo99 closed 3 years ago
The contour bootstrap
init
container that generates /config/resources/sds/xds-tls-certificate.json
may have failed or not run
Do you have any error/status information for that and does your config match https://github.com/projectcontour/contour/blob/78c434fcd9aa8e08f12ee5def7c0e215c0c805c5/examples/contour/03-envoy.yaml#L98 ?
Here is the full describe of the pod :
Name: envoy-jq658
Namespace: vbertell
Priority: 0
Node: douzeasrclsuster-edge-02/172.16.1.7
Start Time: Tue, 19 Jan 2021 13:59:30 +0000
Labels: app=envoy
controller-revision-hash=b7746ff5b
pod-template-generation=1
Annotations: kubernetes.io/psp: privileged
prometheus.io/path: /stats/prometheus
prometheus.io/port: 8002
prometheus.io/scrape: true
seccomp.security.alpha.kubernetes.io/pod: docker/default
Status: Running
IP: 172.16.1.7
IPs:
IP: 172.16.1.7
Controlled By: DaemonSet/envoy
Init Containers:
envoy-initconfig:
Container ID: docker://46ba7dc39f4f8243107706df8bccc559db9dd900de09b33230d71fcff6194a31
Image: xxxxx/projectcontour/contour:v1.11.0
Image ID: docker-pullable://rxxxxx/projectcontour/contour@sha256:a0f9675ae2f1d8204e036ae2a73e4b1c79be19f1b02bb7478bd77b17251179b0
Port: <none>
Host Port: <none>
Command:
contour
Args:
bootstrap
/config/envoy.json
--xds-address=contour
--xds-port=8001
--xds-resource-version=v3
--resources-dir=/config/resources
--envoy-cafile=/certs/ca.crt
--envoy-cert-file=/certs/tls.crt
--envoy-key-file=/certs/tls.key
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 19 Jan 2021 13:59:35 +0000
Finished: Tue, 19 Jan 2021 13:59:35 +0000
Ready: True
Restart Count: 0
Environment:
CONTOUR_NAMESPACE: vbertell (v1:metadata.namespace)
Mounts:
/certs from envoycert (ro)
/config from envoy-config (rw)
Containers:
shutdown-manager:
Container ID: docker://df9c6a5d5d43c4d86d30cf1a29f4bf0c8e721b2a0e2dd026daa9808e70c3dad3
Image: xxxxx/projectcontour/contour:v1.11.0
Image ID: docker-pullable://xxxxx/projectcontour/contour@sha256:a0f9675ae2f1d8204e036ae2a73e4b1c79be19f1b02bb7478bd77b17251179b0
Port: <none>
Host Port: <none>
Command:
/bin/contour
Args:
envoy
shutdown-manager
State: Running
Started: Tue, 19 Jan 2021 13:59:36 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8090/healthz delay=3s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts: <none>
envoy:
Container ID: docker://0d5497db623463b9920c1cd55cb486810964b59afe69184d280248f5e9dba8a8
Image: xxxxx/cesp-envoy:1.16.2-1-2-ic
Image ID: docker-pullable://xxxxx/cesp-envoy@sha256:0486f32009dac92457ee88b005c6c57574d66f513046af43f895cdc5e6d18eb5
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Command:
envoy
Args:
-c
/config/envoy.json
--service-cluster $(CONTOUR_NAMESPACE)
--service-node $(ENVOY_POD_NAME)
--log-level info
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 20 Jan 2021 08:30:01 +0000
Finished: Wed, 20 Jan 2021 08:30:01 +0000
Ready: False
Restart Count: 222
Readiness: http-get http://:8002/ready delay=3s timeout=1s period=4s #success=1 #failure=3
Environment:
CONTOUR_NAMESPACE: vbertell (v1:metadata.namespace)
ENVOY_POD_NAME: envoy-jq658 (v1:metadata.name)
Mounts:
/certs from envoycert (rw)
/config from envoy-config (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
envoy-config:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
envoycert:
Type: Secret (a volume populated by a Secret)
SecretName: envoycert
Optional: false
QoS Class: BestEffort
Node-Selectors: is_edge=true
Tolerations: is_edge=true:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 12m (x220 over 18h) kubelet, douzeasrclsuster-edge-02 Container image "xxxxx/cesp-envoy:1.16.2-1-2-ic" already present on machine
Warning BackOff 2m50s (x5141 over 18h) kubelet, douzeasrclsuster-edge-02 Back-off restarting failed container
The init container matches the suggested yaml configuration as you can see above, unless you spot any error in the configuration ?
The kubectl logs -c envoy-initconfig for the pod are empty, maybe there is a way to access them in a different way, or increase debug level ?
Thanks
Hi @vinzo99, sorry you have this problem, it's definitely not good!
The key error here appears to be '/config/resources/sds/xds-tls-certificate.json' not existing. That file is part of the system we use to secure the communication between Contour and Envoy.
That system requires the Contour namespace (ie projectcontour
) to have the secrets contourcert
, envoycert
, and cacert
. These secrets are created by the contour-certgen
Job, which runs a container which creates them.
I'd start by checking if the secrets are present, and if the contour-certgen
Job ran. (You can use kubectl get job -n projectcontour
for this).
Hi @youngnick, I just checked what you suggested :
1°) the job has successfully run :
NAME COMPLETIONS DURATION AGE
contour-certgen-v1.11.0 1/1 5s 62s
2°) here are the job details :
Name: contour-certgen-v1.11.0
Namespace: vbertell
Selector: controller-uid=5febfb2c-0f0a-4851-9978-17baa4501312
Labels: app=contour-certgen
controller-uid=5febfb2c-0f0a-4851-9978-17baa4501312
job-name=contour-certgen-v1.11.0
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Mon, 25 Jan 2021 07:57:16 +0000
Completed At: Mon, 25 Jan 2021 07:57:21 +0000
Duration: 5s
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pod Template:
Labels: app=contour-certgen
controller-uid=5febfb2c-0f0a-4851-9978-17baa4501312
job-name=contour-certgen-v1.11.0
Service Account: contour-certgen
Containers:
contour:
Image: registry1-docker-io.repo.lab.pl.alcatel-lucent.com/projectcontour/contour:v1.11.0
Port: <none>
Host Port: <none>
Command:
contour
certgen
--kube
--incluster
--overwrite
--secrets-format=compact
--namespace=$(CONTOUR_NAMESPACE)
Environment:
CONTOUR_NAMESPACE: (v1:metadata.namespace)
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 73s job-controller Created pod: contour-certgen-v1.11.0-6gtd6
3°) contourcert
and envoycert
secrets are there :
NAME TYPE DATA AGE
contourcert kubernetes.io/tls 3 18s
envoycert kubernetes.io/tls 3 18s
but cacert
secret does NOT exist, this is most probably related ?
Thanks !
@youngnick meanwhile I also tried and manually perform what contour-certgen
job does, following those directions :
https://projectcontour.io/docs/main/grpc-tls-howto/
(btw, I also had to manually create certs/
and _integration/
directories, and manually touch _integration/cert-contour.ext
and _integration/cert-envoy.ext
files in order to successfully run openssl commands)
Still no cacert
secret, I finally managed to manually create it using old directions (that may be deprecated ...) :
kubectl create secret -n vbertell generic cacert --from-file=./certs/cacert.pem
All secrets are now here :
NAME TYPE DATA AGE
cacert Opaque 1 7m18s
contourcert Opaque 3 17m
envoycert Opaque 3 17m
Envoy pod status is still CrashLoopBackOff
with same errors.
The ca.crt
should be embedded in the envoycert
secret and the contourcert
secret. Can you confirm you have these?
$ kubectl describe secret envoycert -n projectcontour
Name: envoycert
Namespace: projectcontour
Labels: app=contour
Annotations: <none>
Type: kubernetes.io/tls
Data
====
tls.key: 1675 bytes
ca.crt: 1139 bytes
tls.crt: 1265 bytes
$ kubectl exec -it -n projectcontour envoy-78hrf -c envoy cat /certs/ca.crt
-----BEGIN CERTIFICATE-----
<certData>
-----END CERTIFICATE-----
@stevesloka sure :
# kubectl -n vbertell describe secret envoycert
Name: envoycert
Namespace: vbertell
Labels: <none>
Annotations:
Type: Opaque
Data
====
ca.crt: 1188 bytes
tls.crt: 1066 bytes
tls.key: 1675 bytes
I can't exec cat /certs/ca.crt
since envoy container has crashed in envoy-mlhl9
pod, obviously :
# kubectl -n vbertell exec -it envoy-mlhl9 -c envoy cat /certs/ca.crt
error: unable to upgrade connection: container not found ("envoy")
Could you try killing the Envoy pod and letting it restart? At one point, some folks did see an issue where the Envoy pod would try to start before the secrets were ready in the shared secret (but shouldn't happen because it's done in an initContainer).
I already tried that, same result. And you're right, since the job is performed in the initContainer, the envoy container should end up getting started, which it does not, it keeps trying to restart indefinitely. See here, after 3+ days :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 89s (x21618 over 3d6h) kubelet, douzeasrclsuster-edge-02 Back-off restarting failed container
Hi, any hints on this issue ? Thanks
Hi @vinzo99, you can see that @skriss has put this one in "Needs Investigation" in our project board. That means that one of us will need to try and reproduce the issue to see if we can figure out what's causing it.
The way this whole setup works is:
/config/resources
in the /config
EmptyDir mount (which are the ones that are missing in your pods). These are JSON representations of Envoy's SDS protobufs, and tell Envoy to watch the files on disk in the /certs
directory (which is mounted from the Secret envoycerts
)./certs
. This allows the certs to be rotated and have Envoy pick up the changes straight away without restarting./config/resources
directory. So that's where we need to look.So, I have a couple of questions for you:
kubectl delete -n vbertell envoy-jq658
or similar? We really need to get something working there so that we can see what the bootstrap is doing.Hi @youngnick, regarding your 2 questions :
1°) the pods can indeed create emptyDir
volumes. I just made sure of that by completing this short example : https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/
2°) I killed the Envoy pod using this command :
# kubectl -n vbertell get pod
NAME READY STATUS RESTARTS AGE
contour-6b85456c49-4xm4r 1/1 Running 0 15d
contour-6b85456c49-mxbrp 1/1 Running 1 15d
envoy-nj78p 1/2 CrashLoopBackOff 3290 11d
# kubectl -n vbertell delete pod envoy-nj78p
pod "envoy-nj78p" deleted
# kubectl -n vbertell get pod
NAME READY STATUS RESTARTS AGE
contour-6b85456c49-4xm4r 1/1 Running 0 15d
contour-6b85456c49-mxbrp 1/1 Running 1 15d
envoy-2s28h 1/2 CrashLoopBackOff 2 32s
I'm no sure what we can do to monitor the bootstrap process, apart from creating a dummy pod that recreates all actions performed by envoy-initconfig
, or maybe start envoy-initconfig
using a custom Contour image with a shell for debug purposes ...
Thanks !
@vinzo99 I wonder if we're chasing the wrong thing, one item that might be an issue in your environment is the default hostPorts
in the Envoy daemonset. Does your environment allow that? Maybe remove those references and see if the pod starts.
I would expect some sort of error relating but just trying to think of what else it might be.
If not the other path we could try is removing the initContainer & certGen and pushing the bits manually.
@stevesloka
FYI : when I first tried to deploy Contour with the default yaml files I faced the following issue when launching envoy
DaemonSet :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 3s (x13 over 24s) daemonset-controller Error creating: pods "envoy-" is forbidden: unable to validate against any pod security policy: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].hostPort: Invalid value: 80: Host port 80 is not allowed to be used. Allowed ports: [] spec.containers[1].hostPort: Invalid value: 443: Host port 443 is not allowed to be used. Allowed ports: []]
envoy
DaemonSet was not even starting at this point, due to my cluster RBAC environment.
I quickly solved this issue, by adding the following rules in contour
ClusterRole :
- apiGroups:
- extensions
resourceNames:
- privileged
resources:
- podsecuritypolicies
verbs:
- use
which leads us to the current state with envoy
DaemonSet trying to start, then Envoy container crashing with CrashLoopBackOff.
Not sure this is what you meant though.
Thanks !
@vinzo99 yup this might be it, but let's confirm. =)
In the examples, the Envoy daemonset which deploys the Envoy pods has two hostPort
entries (https://github.com/projectcontour/contour/blob/main/examples/contour/03-envoy.yaml#L72 & https://github.com/projectcontour/contour/blob/main/examples/contour/03-envoy.yaml#L76).
Can you remove those and see if your pod spins up properly? I may not work because we need to swap the service values around, but that will tell us what the problem is and then where to go.
Thanks!
@stevesloka I removed both hostPort
lines and get the exact same result :
contour-certgen-v1.11.0
job OKcontour
pods createdenvoy
pod CrashLoopBackOff because of /config/resources/sds/xds-tls-certificate.json not existingThanks !
Are you using Pod Security Policies? Can you share any information about your cluster? Is seems like @youngnick suggested something with the initContainer isn't working to create this default config. Let me see if I can pick out the bits into a configmap, have you apply that and see if you can get it working.
@stevesloka the cluster has 2 main Pod Security Policies, restricted
and privileged
:
# kubectl get podsecuritypolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
restricted false RunAsAny MustRunAsNonRoot MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath
# kubectl describe podsecuritypolicy restricted
Name: restricted
Settings:
Allow Privileged: false
Allow Privilege Escalation: false
Default Add Capabilities: <none>
Required Drop Capabilities: ALL
Allowed Capabilities: <none>
Allowed Volume Types: configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath
Allow Host Network: false
Allow Host Ports: <none>
Allow Host PID: false
Allow Host IPC: false
Read Only Root Filesystem: false
SELinux Context Strategy: RunAsAny
User: <none>
Role: <none>
Type: <none>
Level: <none>
Run As User Strategy: MustRunAsNonRoot
Ranges: <none>
FSGroup Strategy: MustRunAs
Ranges: 1-65535
Supplemental Groups Strategy: MustRunAs
Ranges: 1-65535
# kubectl describe podsecuritypolicy privileged
Name: privileged
Settings:
Allow Privileged: true
Allow Privilege Escalation: true
Default Add Capabilities: <none>
Required Drop Capabilities: <none>
Allowed Capabilities: *
Allowed Volume Types: *
Allow Host Network: true
Allow Host Ports: 0-65535
Allow Host PID: true
Allow Host IPC: true
Read Only Root Filesystem: false
SELinux Context Strategy: RunAsAny
User: <none>
Role: <none>
Type: <none>
Level: <none>
Run As User Strategy: RunAsAny
Ranges: <none>
FSGroup Strategy: RunAsAny
Ranges: <none>
Supplemental Groups Strategy: RunAsAny
Ranges: <none>
the following rules have been added in contour
ClusterRole in order to grant access to ports etc :
- apiGroups:
- extensions
resourceNames:
- privileged
resources:
- podsecuritypolicies
verbs:
- use
In any case, emptyDir
Volume Type is allowed even without privileged rules.
Hope this helps ...
Thanks !
I just spun up a minikube cluster with PSP enabled and I had to change a few things to get this to work:
hostPort
from Envoy daemonset80
/443
to 8080
/8443
in Envoy dsrunAsUser: 1000
to securityContext in Envoy ds80
/443
(so it defaults to 8080,8443)I didn't need to modify the ClusterRole
as you did, it just worked without.
Which helm chart did you use? I can try and recreate that, I never use the helm chart, just use the examples, but want to double check that setup (maybe it's different than the contour repo).
I can put together the files as well to avoid the initContainer, but wanted to double-check the helm chart bits.
@stevesloka a few inputs :
I tried and add a securitycontext
block in the Envoy DaemonSet (btw I used runAsUser: 65534
instead of 1000
, I believe 65534
is the right ID to choose if you want to preserve the same ID in the whole environment). No improvement.
Regarding the other suggestions : unfortunately I need the Envoy pods to act as listeners for the ingress controller, and therefore listen on 80 / 443
on edge nodes, which are now the defaults in the Contour charts. This is the very reason why we need Contour btw. I believe what works for you in a single node minikube system with non-root ports, will not suit our configuration (switching to 8080 / 8443
, removing hostPort
), unless I'm missing a point.
In order to achieve a configuration without a K8S LoadBalancer, with Host Networking (as explained here : https://projectcontour.io/docs/v1.11.0/deploy-options/#host-networking) I used the example charts provided here : https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml
I had to make slight modifications to the charts, such as :
projectcontour
docker.io/projectcontour/contour
and docker.io/envoyproxy/envoy
privileged
rules to allow use of 80 / 443
portstype: LoadBalancer
and adding clusterIP: None
to the Envoy ServicehostNetwork: true
, dnsPolicy: ClusterFirstWithHostNet
and a few lines to support is_edge
nodeSelector to the Envoy DaemonSetThose modifications have been successfully tested on the same cluster, on Envoy/Contour versions released before the introduction of the new certgen process a few months back.
Thanks !
Thanks @vinzo99. I still think it's likely that the files are not getting created properly by the bootstrap.
I thought of a way to check this, which is not ideal, but should work to let you check what the Envoy container is seeing.
In the Envoy daemonset, replace the command
and args
sections with this:
args:
- -c
- "sleep 86400"
command:
- /bin/bash
This will just run a sleeping bash job instead of trying to run Envoy. Then you should be able to kubectl exec
in and have a look around. (kubectl exec -t <envoy pod> -c envoy -- /bin/bash
will get you an interactive shell.)
The things we need to know to find more about this are:
/config/resources
in the Envoy container? (This is the EmptyDir mount that should contain two JSON files, I suspect it will be empty)If the /config/resources
directory is empty, then something is preventing the contour bootstrap
command from outputting its files correctly. Can you check that the bootstrap container command and args looks like this?
args:
- bootstrap
- /config/envoy.json
- --xds-address=contour
- --xds-port=8001
- --xds-resource-version=v3
- --resources-dir=/config/resources
- --envoy-cafile=/certs/ca.crt
- --envoy-cert-file=/certs/tls.crt
- --envoy-key-file=/certs/tls.key
command:
- contour
The key one is the --resources-dir
arg, without it the bootstrap won't attempt to create those files.
Hi @youngnick thanks for your suggestions !
Hi just replaced this part by the suggested one in the Envoy DaemonSet, in order to start a standard shell with 24hrs sleep instead of the envoy command, and get access to the container :
# - args:
# - -c
# - /config/envoy.json
# - --service-cluster $(CONTOUR_NAMESPACE)
# - --service-node $(ENVOY_POD_NAME)
# - --log-level info
# command:
# - envoy
- args:
- -c
- "sleep 86400"
command:
- /bin/bash
The envoy pod starts. For some reason I am not able to log into the container, the kubectl command returns right away, but I still can run single commands, that basically show that resources/
has permissions issues :
# kubectl -n vbertell exec -t envoy-8d86c -c envoy -- ls /config/resources
ls: cannot open directory '/config/resources': Permission denied
command terminated with exit code 2
# kubectl -n vbertell exec -t envoy-8d86c -c envoy -- ls -l /config
total 8
-rw-r--r--. 1 root root 1873 Feb 15 07:32 envoy.json
drwxr-x---. 3 root root 4096 Feb 15 07:32 resources
I am able to touch a file in config/
, which goes to show that emptyDir
volume behaves as expected :
# kubectl -n vbertell exec -t envoy-8d86c -c envoy -- touch /config/toto
# kubectl -n vbertell exec -t envoy-8d86c -c envoy -- ls -l /config
total 8
-rw-r--r--. 1 root root 1873 Feb 15 07:32 envoy.json
drwxr-x---. 3 root root 4096 Feb 15 07:32 resources
-rw-r--r--. 1 envoy envoy 0 Feb 15 07:53 toto
but not in /config/resources/
:
# kubectl -n vbertell exec -t envoy-8d86c -c envoy -- touch /config/resources/toto
touch: cannot touch '/config/resources/toto': Permission denied
command terminated with exit code 1
I believe this directory is created by contour-certgen
, right ?
I also checked the bootstrap part, which seems correct :
initContainers:
- args:
- bootstrap
- /config/envoy.json
- --xds-address=contour
- --xds-port=8001
- --xds-resource-version=v3
- --resources-dir=/config/resources
- --envoy-cafile=/certs/ca.crt
- --envoy-cert-file=/certs/tls.crt
- --envoy-key-file=/certs/tls.key
command:
- contour
Thanks !
Thanks for that @vinzo99, I think you may have missed the -i
on the kubectl exec
command, that would give you an interactive shell (as opposed to just a terminal with -t
). So it should be kubectl exec -it
.
I'll check the permissions for the created directory, this sounds promising, that it's something about the directory creation that's the problem.
Edit: Yes, I can see that this is a "envoy is not running as the root user" problem. The initContainer runs as root, but Envoy is running as the user envoy
, which doesn't have access to the /config/resources
directory. You can see this from the listing where you touch the toto
file.
I'd rather not make the /config/resources
directory world-readable, but is there any way you could make sure that the initContainer runs as the same user as the envoy
container? I think that will get you working for now, while I look into the permissions.
Hi @youngnick !
I followed your suggestion and added a security context in the initContainer in order to launch it with the same user:group envoy:envoy 4444:4444
as the Envoy container itself. That did the trick for the permissions, /config/resources
is now created with all expected files in it :
# kubectl -n vbertell exec -it envoy-l8g7b -c envoy /bin/bash
[envoy@douzeasrclsuster-edge-02 /]$ cd /config/
[envoy@douzeasrclsuster-edge-02 config]$ ls -l
total 8
-rw-r--r--. 1 envoy envoy 1873 Feb 16 13:54 envoy.json
drwxr-x---. 3 envoy envoy 4096 Feb 16 13:54 resources
[envoy@douzeasrclsuster-edge-02 config]$ cd resources/
[envoy@douzeasrclsuster-edge-02 resources]$ ll
total 4
drwxr-x---. 2 envoy envoy 4096 Feb 16 13:54 sds
[envoy@douzeasrclsuster-edge-02 resources]$ cd sds/
[envoy@douzeasrclsuster-edge-02 sds]$ ll
total 8
-rw-r--r--. 1 envoy envoy 210 Feb 16 13:54 xds-tls-certificate.json
-rw-r--r--. 1 envoy envoy 209 Feb 16 13:54 xds-validation-context.json
Now the Envoy pod starts, but the containers on the edge node fail to bind on 80
and 443
ports :
[2021-02-16 14:09:30.960][1][warning][config] [source/common/config/grpc_subscription_impl.cc:107] gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) ingress_http: cannot bind '0.0.0.0:80': Permission denied
ingress_https: cannot bind '0.0.0.0:443': Permission denied
I remember @stevesloka suggested to remove hostPorts
in the DaemonSet, which I tried, still get the error.
Here is the netstat
command for envoy
on edge node :
# netstat -anp|grep envoy
tcp 0 0 0.0.0.0:8002 0.0.0.0:* LISTEN 31621/envoy
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 31621/envoy
tcp 0 0 127.0.0.1:43866 127.0.0.1:9001 ESTABLISHED 31621/envoy
tcp 0 0 139.54.131.84:46454 10.254.91.148:8001 ESTABLISHED 31621/envoy
tcp 0 0 127.0.0.1:9001 127.0.0.1:43866 ESTABLISHED 31621/envoy
tcp 0 0 127.0.0.1:9001 127.0.0.1:43896 ESTABLISHED 31621/envoy
tcp 0 0 127.0.0.1:43896 127.0.0.1:9001 ESTABLISHED 31621/envoy
unix 2 [ ] DGRAM 507776613 31621/envoy @envoy_domain_socket_parent_0@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
unix 2 [ ] DGRAM 507776612 31621/envoy @envoy_domain_socket_child_0@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Any hints on this ? FYI since the Envoy pod is now started I am able to log into the Envoy container, which should make troubleshooting a lot easier.
Thanks !
Hey @vinzo99, after taking out the hostPort
references, you'll need to also edit the Contour deployment to remove the two args which tell Envoy to bind to ports 80/443, otherwise it will still try.
So your steps are:
targetPorts
of 8080
, & 8443
(defaults)Hi @stevesloka !
Like I said, I have no other choice but to use port 80
and 443
, using Contour+Envoy as ingress controller without a K8S LoadBalancer.
I still get Permission denied to bind 80
and 443
on edge node :
[2021-02-17 11:22:09.582][1][warning][config] [source/common/config/grpc_subscription_impl.cc:107] gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) ingress_http: cannot bind '0.0.0.0:80': Permission denied
ingress_https: cannot bind '0.0.0.0:443': Permission denied
Just to clear any doubts, I installed an older working version (Envoy 1.14.1 + Contour 1.4.0) on the same cluster, there is no binding issue as you can see here :
# netstat -anp|grep envoy
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 14612/envoy
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 14612/envoy
tcp 0 0 0.0.0.0:8002 0.0.0.0:* LISTEN 14612/envoy
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 14612/envoy
tcp 0 0 139.54.131.84:45470 10.254.223.244:8001 ESTABLISHED 14612/envoy
tcp 0 0 127.0.0.1:42434 127.0.0.1:9001 ESTABLISHED 14612/envoy
tcp 0 0 127.0.0.1:9001 127.0.0.1:42434 ESTABLISHED 14612/envoy
unix 2 [ ] DGRAM 509537593 14612/envoy @envoy_domain_socket_parent_0@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
unix 2 [ ] DGRAM 509537592 14612/envoy @envoy_domain_socket_child_0@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
I compared configurations between old and current charts, apart from a few unrelated changes (cacert
, adding metrics
etc) there are all the same regarding ports.
I believe something else is preventing Envoy to bind on 80
and 443
in this specific configuration. Not sure if this is related to the security context added as a workaround, as suggested by @youngnick to execute initContainer as envoy
user.
Thanks !
Not being able to bind the 80 and 443 ports is either going to be related to the security context, or something weird going on with the hostPort thing. If you can post your Envoy Daemonset YAML, we can take a look, but without that, I'm not sure how much more we will be able to help.
@youngnick sure !
The configuration is based on the template https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml, + the following :
ADDON
section at the end, which handles edge node selectorDEBUG
section in envoy-initconfig
, with security context as a workaround for right permissions of /config/resources
.apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: envoy
name: envoy
# namespace: projectcontour
namespace: {{ .Release.Namespace }}
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 10%
selector:
matchLabels:
app: envoy
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8002"
prometheus.io/path: "/stats/prometheus"
labels:
app: envoy
spec:
containers:
- command:
- /bin/contour
args:
- envoy
- shutdown-manager
# image: docker.io/projectcontour/contour:v1.11.0
image: {{ .Values.global.registry1 }}/projectcontour/contour:{{ .Values.contour }}
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/contour
- envoy
- shutdown
livenessProbe:
httpGet:
path: /healthz
port: 8090
initialDelaySeconds: 3
periodSeconds: 10
name: shutdown-manager
- args:
- -c
- /config/envoy.json
- --service-cluster $(CONTOUR_NAMESPACE)
- --service-node $(ENVOY_POD_NAME)
- --log-level info
command:
- envoy
# image: image: docker.io/envoyproxy/envoy:v1.16.2
image: {{ .Values.global.registry }}/{{ .Values.imageRepo }}:{{ .Values.imageTag }}-ic
imagePullPolicy: IfNotPresent
name: envoy
env:
- name: CONTOUR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: ENVOY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
ports:
- containerPort: 80
hostPort: 80
name: http
protocol: TCP
- containerPort: 443
hostPort: 443
name: https
protocol: TCP
readinessProbe:
httpGet:
path: /ready
port: 8002
initialDelaySeconds: 3
periodSeconds: 4
volumeMounts:
- name: envoy-config
mountPath: /config
- name: envoycert
mountPath: /certs
lifecycle:
preStop:
httpGet:
path: /shutdown
port: 8090
scheme: HTTP
initContainers:
- args:
- bootstrap
- /config/envoy.json
- --xds-address=contour
- --xds-port=8001
- --xds-resource-version=v3
- --resources-dir=/config/resources
- --envoy-cafile=/certs/ca.crt
- --envoy-cert-file=/certs/tls.crt
- --envoy-key-file=/certs/tls.key
command:
- contour
# image: docker.io/projectcontour/contour:v1.11.0
image: {{ .Values.global.registry1 }}/projectcontour/contour:{{ .Values.contour }}
imagePullPolicy: IfNotPresent
name: envoy-initconfig
volumeMounts:
- name: envoy-config
mountPath: /config
- name: envoycert
mountPath: /certs
readOnly: true
env:
- name: CONTOUR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
##### DEBUG
securityContext:
runAsNonRoot: true
runAsUser: 4444
runAsGroup: 4444
##### /DEBUG
automountServiceAccountToken: false
serviceAccountName: envoy
terminationGracePeriodSeconds: 300
volumes:
- name: envoy-config
emptyDir: {}
- name: envoycert
secret:
secretName: envoycert
restartPolicy: Always
############# ADDON
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
# see https://projectcontour.io/docs/v1.11.0/deploy-options/#host-networking
nodeSelector: {is_edge: 'true'}
tolerations:
- key: 'is_edge'
operator: 'Equal'
value: 'true'
effect: 'NoExecute'
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: contour
topologyKey: kubernetes.io/hostname
############# /ADDON
I also suspect the issue is related to the security context workaround.
Thanks !
Ah, I think that you may need to actually tell the security context that the pod needs to bind to low ports. This can be done either by having the pod run as privileged, or by adding the CAP_NET_BIND_SERVICE capability:
securityContext:
runAsNonRoot: true
runAsUser: 4444
runAsGroup: 4444
capabilities:
add:
- NET_BIND_SERVICE
When you are not running as root, you can't bind <1024 without that capability.
Note that the PSP you've setup will permit this binding, because it allows all capabilities, but it won't add them for you. That's what the securityContext does.
Edit: A great gotcha with capabilities is that you have to drop the CAP_
from the name referred to everywhere else when you refer to them in Kubernetes config.
Hi @youngnick
Thanks for your suggestion, I tried it but unfortunately I still get the same error.
I guess the capability needs to be added in the envoy
container which is the one trying to bind 80
and 443
, and not in envoy-initconfig
which is run as 4444
. I tried and set a second securityContext at the envoy
container level with NET_BIND_SERVICE
, same error.
I also tried and set a global securityContext for the whole DaemonSet (which should apply to all containers), same.
Since this issue appears to come from the securityContext workaround, maybe we can try an different approach : do you have any hint on a fix that would make envoy-initconfig
create /config/resources
with the sufficient permissions, and allow us to run the containers as 65534
like we used to ? In our process we might not know the envoy
userid when we generate the helm charts anyway, so the securityContext solution is ok for a workaround but not for production.
Thanks !
We can change the contour bootstrap
command to create the config/resources
directory as 777
, which should solve your problem, I think. Normally, I'd be concerned about setting secret-holding directories to that mode, but in this case, the actual files in that directory are pointers to the actual secrets (which are mounted in from Kubernetes Secrets). So I think it should be okay.
The change should definitely explain why we do that and refer back to this issue though.
@youngnick I have a draft PR out: https://github.com/projectcontour/contour/pull/3390 Will test out the changes with a local build and update here in a day or two.
Yep, also running into this issue because I'm using the https://hub.docker.com/r/bitnami/envoy image instead of the envoyproxy/envoy one, and so my envoy doesn't run as root
Thanks ! the fix being available in main
branch, implies it should also be in next freeze release-1.14
, right ?
Thanks ! the fix being available in
main
branch, implies it should also be in next freezerelease-1.14
, right ?
Yes that is correct
Hi
we are deploying Contour (v1.11.0), with Envoy (v1.16.2) as a DaemonSet, using the following yaml templates : https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml
We only applied minor changes to fit our configuration (such as pointing to our local images repository, adding privileges for RBAC etc).
When firing up the helm installation, the Envoy pod fails with CrashLoopBackOff, with the following error in the envoy container :
this error wasn't occurring with older versions.
Just for information, the contour-certgen job has been successfully run, and the Contour pods are up&running.
Can you please advise ?
Thanks