kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.59k stars 8.27k forks source link

Attempt to acquire IP from cloud provider is forbidden #5916

Closed frgomes closed 4 years ago

frgomes commented 4 years ago

Describe the bug

We need multiple namespaces have the ability to obtain separate IP addresses allocated from the cloud infrastructure.
We've employed instructions (slightly adapted) from https://kubernetes.github.io/ingress-nginx/examples/static-ip/ ended up on error

No namespace with name j1111cc5555 found: namespaces "j1111cc5555" is forbidden: User "system:serviceaccount:ingress-nginx:ingress-nginx" cannot get resource "namespaces" in API group "" in the namespace "j1111cc5555"

To Reproduce

Steps to reproduce the behavior:

  1. An ingress controller was deployed as usual in namespace ingress-nginx and it works fine as expected. However, this ingress controller only monitors namespace ingress-nginx.

  2. Now I would like to deploy a second ingress controller, for the purpose of obtaining a second IP address from the cloud infrastructure and for the purpose of monitoring a second namespace. Then, I've applied the yaml configurations below:

$ kubectl apply -f modified-static-ip-svc.yaml

# This is the backend service
apiVersion: v1
kind: Service
metadata:
  namespace: ingress-nginx
  name: nginx-ingress-j1111cc5555
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: 'true'
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  ports:
  - port: 80
    name: http
    targetPort: 80
  - port: 443
    name: https
    targetPort: 443
  selector:
    # Selects nginx-ingress-controller pods
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

$ kubectl apply -f second-nginx-ingress-controller.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/version: 0.34.0
    helm.sh/chart: ingress-nginx-2.11.0
  name: nginx-ingress-j1111cc5555
  namespace: ingress-nginx
spec:
  minReadySeconds: 0
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/name: ingress-nginx
    spec:
      containers:
      - args:
        - /nginx-ingress-controller
        - --publish-service=ingress-nginx/nginx-ingress-j1111cc5555
        - --election-id=ingress-controller-leader-ingress-nginx
        - --ingress-class=nginx-j1111cc5555
        - --configmap=ingress-nginx/nginx-ingress-j1111cc5555
        - --watch-namespace=j1111cc5555
        - --validating-webhook=:8443
        - --validating-webhook-certificate=/usr/local/certificates/cert
        - --validating-webhook-key=/usr/local/certificates/key
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.0@sha256:56633bd00dab33d92ba14c6e709126a762d54a75a6e72437adefeaaca0abb069
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /wait-shutdown
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        - containerPort: 8443
          name: webhook
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 100m
            memory: 90Mi
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          runAsUser: 101
        volumeMounts:
        - mountPath: /usr/local/certificates/
          name: webhook-cert
          readOnly: true
      dnsPolicy: ClusterFirst
      serviceAccountName: ingress-nginx
      terminationGracePeriodSeconds: 300
      volumes:
      - name: webhook-cert
        secret:
          secretName: ingress-nginx-admission
  1. A second load balancer was allocated on Digital Ocean. ingress-nginx nginx-ingress-j1111cc5555 LoadBalancer 10.245.0.84 138.68.119.239 80:31619/TCP,443:32174/TCP 1h

  2. However, the pod crashed:

    
    -------------------------------------------------------------------------------
    NGINX Ingress controller
    Release:       v0.34.0
    Build:         v20200709-ingress-nginx-2.10.0-28-g8693cdb89
    Repository:    https://github.com/kubernetes/ingress-nginx
    nginx version: nginx/1.19.1

I0719 12:01:24.893687 7 flags.go:205] Watching for Ingress class: nginx-j1111cc5555 W0719 12:01:24.893768 7 flags.go:208] Only Ingresses with class "nginx-j1111cc5555" will be processed by this Ingress controller W0719 12:01:24.894416 7 flags.go:250] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) W0719 12:01:24.894581 7 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0719 12:01:24.894852 7 main.go:231] Creating API client for https://10.245.0.1:443 I0719 12:01:24.907009 7 main.go:275] Running in Kubernetes cluster version v1.17 (v1.17.5) - git (clean) commit e0fccafd69541e3750d460ba0f9743b90336f24f - platform linux/amd64 F0719 12:01:24.913986 7 main.go:100] No namespace with name j1111cc5555 found: namespaces "j1111cc5555" is forbidden: User "system:serviceaccount:ingress-nginx:ingress-nginx" cannot get resource "namespaces" in API group "" in the namespace "j1111cc5555"



## Expected behavior  
I would expect to have the ingress controller able to manage the second allocated IP address.

## Your environment
 * Version of the Ingress Controller - release version or a specific commit: v20200709-ingress-nginx-2.10.0-28-g8693cdb89
 * Version of Kubernetes: v1.17.5
 * Kubernetes platform: Digital Ocean
 * Using NGINX or NGINX Plus: nginx/1.19.1

## Additional context  
Add any other context about the problem here. Any log files you want to share.
frgomes commented 4 years ago

Maybe duplicate of kubernetes/ingress-nginx#5758 ?

frgomes commented 4 years ago

As found on #5758, I've double checked that the role mentions permission to access namespaces:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/version: 0.34.0
    helm.sh/chart: ingress-nginx-2.11.0
  name: ingress-nginx
  namespace: ingress-nginx
rules:
- apiGroups:
  - ''
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
aledbf commented 4 years ago

We've employed instructions (slightly adapted) from https://kubernetes.github.io/ingress-nginx/examples/static-ip/ which ended up on error

The example is for GCP. There is no support for static IP addresses in DO https://www.digitalocean.com/docs/kubernetes/how-to/configure-load-balancers/

aledbf commented 4 years ago

I would expect to have the ingress controller able to manage the second allocated IP address.

Not sure what do you mean with that. The ingress controller does not allocate any IP address. It only uses the kubernetes IP address from the service references in the flag --publish-service to update the ingress status field

frgomes commented 4 years ago

@aledbf

The example is for GCP. There is no support for static IP addresses in DO https://www.digitalocean.com/docs/kubernetes/how-to/configure-load-balancers/

Right. I've changed the YAML so that it asks for a load balancer, obtaining a dynamically allocated IP address. This is working fine as usual.

The ingress controller does not allocate any IP address.

Yes, correct. The service allocates the IP address. This is happening as expected as shown in step 3. Yes, the file name of the YAML file is a bit misleading. I'm not wiring to a static IP, really, but requesting a load balancer as usual (dynamic IP).

$ kubectl apply -f modified-static-ip-svc.yaml 
$ kubectl get svc -A | grep nginx-ingress-j1111cc5555
ingress-nginx   nginx-ingress-j1111cc5555             LoadBalancer   10.245.0.84      138.68.119.239   80:31619/TCP,443:32174/TCP   3d

Not sure what do you mean with that.

Rephrasing what I would expect to see: I would expect to see a successful deployment and seeing the pod starting up as normal, instead of the fatal error message shown in step 4.

I0719 12:01:24.893687       7 flags.go:205] Watching for Ingress class: nginx-j1111cc5555
W0719 12:01:24.893768       7 flags.go:208] Only Ingresses with class "nginx-j1111cc5555" will be processed by this Ingress controller
W0719 12:01:24.894416       7 flags.go:250] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
W0719 12:01:24.894581       7 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0719 12:01:24.894852       7 main.go:231] Creating API client for https://10.245.0.1:443
I0719 12:01:24.907009       7 main.go:275] Running in Kubernetes cluster version v1.17 (v1.17.5) - git (clean) commit e0fccafd69541e3750d460ba0f9743b90336f24f - platform linux/amd64
F0719 12:01:24.913986       7 main.go:100] No namespace with name j1111cc5555 found: namespaces "j1111cc5555" is forbidden: User "system:serviceaccount:ingress-nginx:ingress-nginx" cannot get resource "namespaces" in API group "" in the namespace "j1111cc5555"

I guess that I'm missing something related to RBAC, perhaps? I've found #5758 which I suppose is related if not a duplicate. I've checked the role configuration and in principle it is just fine. I'm showing that above.

I'm stuck. Any ideas, please? Thanks a lot for your help :-)

aledbf commented 4 years ago

No namespace with name j1111cc5555 found: namespaces "j1111cc5555" is forbidden: User "system:serviceaccount:ingress-nginx:ingress-nginx" cannot get resource "namespaces" in API group "" in the namespace "j1111cc5555"

How are you installing the ingress controller? It looks some rbac rule is missing

frgomes commented 4 years ago

@aledbf: I'm installing an ingress controller as usual on namespace ingress-nginx. Then I'm trying these two yaml files as above, which I would expect would deploy another load balancer (which works) and would deploy a second ingress controller. I will double check everything tonight, I will do some additional tests and try some variants and let you know tomorrow.

frgomes commented 4 years ago

@aledbf : I'm clueless.

  1. I've reinstalled the cluster from scratch and, as before, the (default) ingress controller (on namespace ingress-nginx) works fine as expected.

  2. Then I've undeployed the ingress controller and added --watch-namespace=ingress-nginx so that it will not react to deployments involving other namespaces.

  3. I've restarted the ingress controller and it works as expected for deployments involving namespace ingress-nginx.

  4. Then I've deployed an additional service which allocates a load balancer on Digital Ocean, stealing ideas from here: https://kubernetes.github.io/ingress-nginx/examples/static-ip/. It works as expected, allocating an external IP.

# This is the backend service
apiVersion: v1
kind: Service
metadata:
  namespace: j1111cc5555
  name: nginx-ingress-j1111cc5555
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: 'true'
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  ports:
  - port: 80
    name: http
    targetPort: 80
  - port: 443
    name: https
    targetPort: 443
  selector:
    # Selects nginx-ingress-controller pods
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  1. Then finally I've deployed a second ingress controller, once again stealing ideas from here: https://kubernetes.github.io/ingress-nginx/examples/static-ip/.
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: j1111cc5555
  name: nginx-ingress-controller
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/part-of: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
    spec:
      # hostNetwork makes it possible to use ipv6 and to preserve the source IP correctly regardless of docker configuration
      # however, it is not a hard dependency of the nginx-ingress-controller itself and it may cause issues if port 10254 already is taken on the host
      # that said, since hostPort is broken on CNI (https://github.com/kubernetes/kubernetes/issues/31307) we have to use hostNetwork where CNI is used
      # like with kubeadm
      # hostNetwork: true
      terminationGracePeriodSeconds: 60
      containers:
      - image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1@sha256:0e072dddd1f7f8fc8909a2ca6f65e76c5f0d2fcfb8be47935ae3457e8bbceb20
        name: nginx-ingress-controller
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1
        ports:
        - containerPort: 80
          hostPort: 80
        - containerPort: 443
          hostPort: 443
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        args:
        - /nginx-ingress-controller
        - --publish-service=$(POD_NAMESPACE)/nginx-$(POD_NAMESPACE)
        - --ingress-class=nginx-$(POD_NAMESPACE)
        - --watch-namespace=$(POD_NAMESPACE)
        - --v=10

The pod fails to start and I'm clueless about what would be the root cause of the error, since the log does hides the underlying error:

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v0.34.1
  Build:         v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.1

-------------------------------------------------------------------------------

W0726 17:10:20.306600       7 flags.go:250] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
W0726 17:10:20.306718       7 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0726 17:10:20.307294       7 main.go:231] Creating API client for https://10.245.0.1:443
I0726 17:10:20.318651       7 main.go:275] Running in Kubernetes cluster version v1.18 (v1.18.3) - git (clean) commit 2e7996e3e2712684bc73f0dec0200d64eec7fe40 - platform linux/amd64

...

I0726 18:01:21.901838       7 round_trippers.go:443] GET https://10.245.0.1:443/api/v1/namespaces/j1111cc5555/services/nginx-j1111cc5555 403 Forbidden in 1 milliseconds
I0726 18:01:21.901892       7 round_trippers.go:449] Response Headers:
I0726 18:01:21.901897       7 round_trippers.go:452]     Audit-Id: 9c8ea9e0-d14c-4fe0-8587-f78ce192df8f
I0726 18:01:21.901901       7 round_trippers.go:452]     Cache-Control: no-cache, private
I0726 18:01:21.901906       7 round_trippers.go:452]     Content-Type: application/json
I0726 18:01:21.901910       7 round_trippers.go:452]     X-Content-Type-Options: nosniff
I0726 18:01:21.901913       7 round_trippers.go:452]     Content-Length: 350
I0726 18:01:21.901918       7 round_trippers.go:452]     Date: Sun, 26 Jul 2020 18:01:21 GMT
I0726 18:01:21.902214       7 request.go:1068] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"nginx-j1111cc5555\" is forbidden: User \"system:serviceaccount:j1111cc5555:default\" cannot get resource \"services\" in API group \"\" in the namespace \"j1111cc5555\"","reason":"Forbidden","details":{"name":"nginx-j1111cc5555","kind":"services"},"code":403}
F0726 18:01:21.902820       7 main.go:93] ✖ The cluster seems to be running with a restrictive Authorization mode and the Ingress controller does not have the required permissions to operate normally.

Is this log useful for understanging the root cause of the issue? I mean: I know that the problem is missing RBAC rules ... but the point is: in a scenario with multiple ingress controllers, is it possible to have a second controller deployed under a distinct namespace? If it is, which Role(s) and RoleBinding(s) should be configured? Is there an example configuration somewhere?

Thanks a lot,

aledbf commented 4 years ago

@frgomes I am not sure how are you modifying the manifests. Please check what I did

#### Generate yaml manifests

cat << EOF | helm template j1111cc5555 charts/ingress-nginx --namespace j1111cc5555 --values - | hack/add-namespace.py j1111cc5555 > j1111cc5555.yaml
controller:
  service:
    type: LoadBalancer
    externalTrafficPolicy: Local
    annotations:
      service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: "true"
  config:
    use-proxy-protocol: "true"
  scope:
    enabled: true
EOF

The manifest is here https://gist.github.com/aledbf/43bb16be33daf39c5ecdce46d067bee2

#### Check the differences

diff -u deploy/static/provider/do/deploy.yaml j1111cc5555.yaml

#### Install ingress-nginx in namespace j1111cc5555

kubectl apply -f j1111cc5555.yaml

sleep 90

kubectl logs -f -n j1111cc5555          ingress-nginx-controller-785f6c5d9b-xw2jt 
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v0.34.1
  Build:         v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.1

-------------------------------------------------------------------------------

I0726 19:01:19.931968       7 flags.go:205] Watching for Ingress class: nginx
W0726 19:01:19.932362       7 flags.go:250] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
W0726 19:01:19.932402       7 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0726 19:01:19.932582       7 main.go:231] Creating API client for https://10.96.0.1:443
I0726 19:01:19.937145       7 main.go:275] Running in Kubernetes cluster version v1.18 (v1.18.2) - git (clean) commit 52c56ce7a8272c798dbc29846288d7cd9fbae032 - platform linux/amd64
I0726 19:01:20.093603       7 main.go:105] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
I0726 19:01:20.094175       7 main.go:113] Enabling new Ingress features available since Kubernetes v1.18
W0726 19:01:20.095826       7 main.go:125] No IngressClass resource with name nginx found. Only annotation will be used.
I0726 19:01:20.098716       7 ssl.go:528] loading tls certificate from certificate path /usr/local/certificates/cert and key path /usr/local/certificates/key
I0726 19:01:20.124618       7 nginx.go:263] Starting NGINX Ingress controller
I0726 19:01:20.127011       7 event.go:278] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"j1111cc5555", Name:"ingress-nginx-controller", UID:"73c3d9f6-c8ab-47df-b86a-6c91f99d5db7", APIVersion:"v1", ResourceVersion:"726", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap j1111cc5555/ingress-nginx-controller
I0726 19:01:21.326770       7 nginx.go:307] Starting NGINX process
I0726 19:01:21.327281       7 leaderelection.go:242] attempting to acquire leader lease  j1111cc5555/ingress-controller-leader-nginx...
I0726 19:01:21.330233       7 nginx.go:327] Starting validation webhook on :8443 with keys /usr/local/certificates/cert /usr/local/certificates/key
I0726 19:01:21.330773       7 controller.go:141] Configuration changes detected, backend reload required.
I0726 19:01:21.345808       7 leaderelection.go:252] successfully acquired lease j1111cc5555/ingress-controller-leader-nginx
I0726 19:01:21.346015       7 status.go:86] new leader elected: ingress-nginx-controller-785f6c5d9b-xw2jt
I0726 19:01:21.403559       7 controller.go:157] Backend successfully reloaded.
I0726 19:01:21.403618       7 controller.go:166] Initial sync, sleeping for 1 second.
frgomes commented 4 years ago

@aledbf : Thanks a lot for your kindness and prompt response :+1:

I think there's some misunderstanding on my part. I explain:

Some time ago I was able to create multiple ingress controllers. I did almost exactly what you did, in fact. So, If I know how to do that and if I was successful before... what the hell I'm asking here in this issue?

Well, I was trying to reduce the memory footprint involved in our architecture, something that https://kubernetes.github.io/ingress-nginx/examples/static-ip/ made me believe could be possible. Wrong assumption or just total misunderstanding on my part.

Better if we close this issue here and move the conversation to another issue so that people landing here can benefit from the matter. More info: #5939

Thanks a lot. :100: