nginxinc / kubernetes-ingress

NGINX and NGINX Plus Ingress Controllers for Kubernetes
https://docs.nginx.com/nginx-ingress-controller
Apache License 2.0
4.59k stars 1.95k forks source link

3.6.0/3.6.1 nginx ingress controller not working || dial unix /var/lib/nginx/nginx-config-version.sock: connect: no such file or directory #5981

Open reddyblokesh opened 1 week ago

reddyblokesh commented 1 week ago

Describe the bug

To Reproduce Steps to reproduce the behavior:

  1. Deploy 3.6.0/3.6.1 (nnginx-plus ingress controller)
  2. View logs on k logs -f
  3. See error: Get "http://config-version/configVersion": dial unix /var/lib/nginx/nginx-config-version.sock: connect: no such file or directory

Expected behavior no errors should be logged rather nginx has to be reloaded

Your environment

Additional context Add any other context about the problem here. Any log files you want to share.

github-actions[bot] commented 1 week ago

Hi @reddyblokesh thanks for reporting!

Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this :slightly_smiling_face:

Cheers!

reddyblokesh commented 1 week ago

related to https://github.com/nginxinc/kubernetes-ingress/issues/2421

alnhk commented 1 week ago

Addtional information : We pre-built an image by means of doing below steps :

haywoodsh commented 1 week ago

Hi, I noticed that you were trying to use FIPS in 3.6.x. We are having some issues with FIPS image as stated in our release notes https://docs.nginx.com/nginx-ingress-controller/releases/#361 and release logs https://github.com/nginxinc/kubernetes-ingress/releases/tag/v3.6.1 Can you try to pull the 3.6.1 FIPS image from our registry directly and see if it works for you? If our image works for you, but you would like to use your customized version, maybe you could try building a new image with ours as base, like this:

FROM <the published 3.6.1 image>

USER root
RUN ln -svf /dev/stdout /var/log/nginx/access.log \
    && ln -svf /dev/stderr /var/log/nginx/error.log
USER 101
alnhk commented 1 week ago

Hi, I noticed that you were trying to use FIPS in 3.6.x. We are having some issues with FIPS image as stated in our release notes https://docs.nginx.com/nginx-ingress-controller/releases/#361 and release logs https://github.com/nginxinc/kubernetes-ingress/releases/tag/v3.6.1 Can you try to pull the 3.6.1 FIPS image from our registry directly and see if it works for you? If our image works for you, but you would like to use your customized version, maybe you could try building a new image with ours as base, like this:

FROM <the published 3.6.1 image>

USER root
RUN ln -svf /dev/stdout /var/log/nginx/access.log \
    && ln -svf /dev/stderr /var/log/nginx/error.log
USER 101

Hello @haywoodsh : yes we are trying to use FIPS in 3.6.x and even 3.5.2. to clarify, after building an image using this step - make alpine-image-plus-fips PREFIX=nginx-ingress TARGET=container TAG=3.6.1_${sha1} , just add below steps correct ?

FROM <the published 3.6.1 image>

USER root
RUN ln -svf /dev/stdout /var/log/nginx/access.log \
    && ln -svf /dev/stderr /var/log/nginx/error.log
USER 101
alnhk commented 1 week ago

Never mind @haywoodsh : understood and it is now working. So, an issue is with building an image directly from github (https://github.com/nginxinc/kubernetes-ingress with tag 3.6.0 or 3.6.1), instead pull an image from private-nginx registry. So this works! Please let us know when this will be fixed in repository ?

alnhk commented 1 week ago

Update : We observe that 3.6.0 image from private-nginx registry is working however 3.6.1 is not.

reddyblokesh commented 1 week ago

As per an update from https://github.com/nginxinc/kubernetes-ingress/issues/5981, we were told that building an FIPS enabled image directly from the repository, i.e https://github.com/nginxinc/kubernetes-ingress broke things up, and told to "fetch" or "pull" FIPS image from private nginx registry (private-registry.nginx.com) using Dockerfile. In the dockerfile, add "FROM" to pull an image from private-registry.nginx.com and add the customization especially stdout/stderr for /var/log/nginx. Based on verification, 3.6.0 is working however, 3.6.1 is not. We tried both from build an image from repository as well as pull an image from private-registry.nginx.com.

haywoodsh commented 1 week ago

I managed to deploy the 3.6.1 FIPS image private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips on my local cluster. Below are my deployment specifications and logs. Could you share yours as well? Additionally, have you tried to deploy the published image directly, rather than as a base image for customization?

 kubectl describe deployment -n nginx-ingress nginx-ingress
Name:                   nginx-ingress
Namespace:              nginx-ingress
CreationTimestamp:      Tue, 09 Jul 2024 11:51:08 +0100
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 3
Selector:               app=nginx-ingress
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=nginx-ingress
                    app.kubernetes.io/name=nginx-ingress
  Annotations:      prometheus.io/port: 9113
                    prometheus.io/scheme: http
                    prometheus.io/scrape: true
  Service Account:  nginx-ingress
  Containers:
   nginx-plus-ingress:
    Image:       private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips
    Ports:       80/TCP, 443/TCP, 8081/TCP, 9113/TCP, 9114/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      -nginx-plus
      -nginx-configmaps=$(POD_NAMESPACE)/nginx-config
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
      POD_NAME:        (v1:metadata.name)
    Mounts:           <none>
  Volumes:            <none>
  Node-Selectors:     <none>
  Tolerations:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  nginx-ingress-7479879679 (0/0 replicas created), nginx-ingress-9fd5547c8 (0/0 replicas created)
NewReplicaSet:   nginx-ingress-79c6b5f9b5 (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  7m40s  deployment-controller  Scaled up replica set nginx-ingress-7479879679 to 1
  Normal  ScalingReplicaSet  5m6s   deployment-controller  Scaled up replica set nginx-ingress-9fd5547c8 to 1
  Normal  ScalingReplicaSet  5m5s   deployment-controller  Scaled down replica set nginx-ingress-7479879679 to 0 from 1
  Normal  ScalingReplicaSet  3m40s  deployment-controller  Scaled up replica set nginx-ingress-79c6b5f9b5 to 1
  Normal  ScalingReplicaSet  3m39s  deployment-controller  Scaled down replica set nginx-ingress-9fd5547c8 to 0 from 1
kubectl logs -f -n nginx-ingress nginx-ingress-79c6b5f9b5-d4jvq
NGINX Ingress Controller Version=3.6.1 Commit=aec5debf08c140a8d5d97f3fc596061aa756e9b0 Date=2024-07-04T08:41:26Z DirtyState=false Arch=linux/arm64 Go=go1.22.5
I0709 10:55:08.538374       1 flags.go:321] Starting with flags: ["-nginx-plus" "-nginx-configmaps=nginx-ingress/nginx-config"]
I0709 10:55:08.542246       1 main.go:292] Kubernetes version: 1.28.8
I0709 10:55:08.546405       1 main.go:437] Using nginx version: nginx/1.25.5 (nginx-plus-r32)
I0709 10:55:08.556058       1 main.go:868] Pod label updated: nginx-ingress-79c6b5f9b5-d4jvq
2024/07/09 10:55:08 [notice] 18#18: using the "epoll" event method
2024/07/09 10:55:08 [notice] 18#18: OpenSSL FIPS Mode is enabled
2024/07/09 10:55:08 [notice] 18#18: nginx/1.25.5 (nginx-plus-r32)
2024/07/09 10:55:08 [notice] 18#18: built by gcc 13.2.1 20231014 (Alpine 13.2.1_git20231014) 
2024/07/09 10:55:08 [notice] 18#18: OS: Linux 6.6.31-linuxkit
2024/07/09 10:55:08 [notice] 18#18: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/07/09 10:55:08 [notice] 18#18: start worker processes
2024/07/09 10:55:08 [notice] 18#18: start worker process 19
2024/07/09 10:55:08 [notice] 18#18: start worker process 20
2024/07/09 10:55:08 [notice] 18#18: start worker process 21
2024/07/09 10:55:08 [notice] 18#18: start worker process 22
2024/07/09 10:55:08 [notice] 18#18: start worker process 23
alnhk commented 1 week ago

Hello @haywoodsh : Here is what we did was :

FROM private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips
USER root
RUN ln -svf /dev/stdout /var/log/nginx/access.log \
    && ln -svf /dev/stderr /var/log/nginx/error.log
USER 101

And we encountered below error.

Verify.go:85] Unable to fetch version: error getting client: Get "http://config-version/configVersion": dial unix /var/lib/nginx/nginx-config-version.sock: connect: no such file or directory

Regarding helm chart, we used 3.6.1 helm chart (without CRD), and we used "overlay" like how we did with 3.5.2 which is currently working, today we tried 3.6.0 and it works well. Using the same config we used for 3.5.2 and 3.6.0, this did not work for 3.6.1 (helm chart 3.6.1 + published image). Below is the piece of the log.

NGINX Ingress Controller Version=3.6.1  Commit=67ef4d92fae250fc916f4de5bd667db76551958e Date=2024-06-26T08:09:33Z DirtyState=true Arch=linux/amd64 Go=go1.22.4
I0708 06:49:21.793291       1 flags.go:321] Starting with flags: ["-nginx-plus=true" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=nginx-dev/nginx-ingress-dev" "-default-server-tls-secret=nginx-dev/nginx-ingress-dev-default-server-tls" "-ingress-class=nginx-dev" "-watch-namespace=nginx-dev" "-health-status=true" "-health-status-uri=/_nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=true" "-nginx-status-port=8080" "-nginx-status-allow-cidrs=127.0.0.1" "-report-ingress-status" "-external-service=nginx-ingress-dev-controller" "-enable-leader-election=true" "-leader-election-lock-name=nginx-ingress-leader" "-enable-prometheus-metrics=true" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=false" "-enable-snippets=true" "-include-year=false" "-disable-ipv6=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=true" "-ssl-dynamic-reload=true" "-enable-telemetry-reporting=false" "-weight-changes-dynamic-reload=true"]
I0708 06:49:21.793362       1 flags.go:337] Namespaces watched: [nginx-dev]
I0708 06:49:21.804208       1 main.go:292] Kubernetes version: 1.26.13
I0708 06:49:21.817067       1 main.go:437] Using nginx version: nginx/1.25.5 (nginx-plus-r32)
I0708 06:49:22.032810       1 main.go:868] Pod label updated: nginx-ingress-dev-controller-f448847fd-cb999
I0708 06:49:22.032810       1 Verify.go:85] Unable to fetch version: error getting client: Get "http://config-version/configVersion": dial unix /var/lib/nginx/nginx-config-version.sock: connect: no such file or directory
brianehlert commented 1 week ago

The NGINX team has been taking a deeper look at this and it is related to an OpenSSL update that has been first taken into Alpine. As noted, paid customers have access to container images that are built by the NGINX Ingress Controller team.
This has been specifically modified to remove the breaking OpenSSL change and is available.

https://github.com/alpinelinux/docker-alpine/issues/406

According to https://github.com/openssl/openssl/issues/24826 it appears that a fix is coming from OpenSSL

haywoodsh commented 1 week ago

I was not able to reproduce the error. I built nginx-plus-ingress:3.6.1-alpine-fips-custom-log from our official image private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips with the log changes, and it runs fine on my local cluster. @alnhk Did the unmodified official image work for you, or did it display the same error message as well?

Name:             nginx-ingress-78c84496bb-n48bl
Namespace:        nginx-ingress
Priority:         0
Service Account:  nginx-ingress
Node:             k3d-gitops-server-0/172.18.0.3
Start Time:       Wed, 10 Jul 2024 12:13:20 +0100
Labels:           app=nginx-ingress
                  app.kubernetes.io/name=nginx-ingress
                  app.kubernetes.io/version=3.6.1
                  app.nginx.org/version=1.25.5-nginx-plus-r32
                  pod-template-hash=78c84496bb
Annotations:      prometheus.io/port: 9113
                  prometheus.io/scheme: http
                  prometheus.io/scrape: true
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.42.0.182
IPs:
  IP:           10.42.0.182
Controlled By:  ReplicaSet/nginx-ingress-78c84496bb
Containers:
  nginx-plus-ingress:
    Container ID:  containerd://97da4384bfcdac52af09de58188cd11432031244a8c5f8d0828dd94f7ac72552
    Image:         private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips-custom-log
    Image ID:      sha256:668e91fce4ab70509ba1f20e4aeedd5d5c39c3679333df49782bc158b8edf945
    Ports:         80/TCP, 443/TCP, 8081/TCP, 9113/TCP, 9114/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      -nginx-plus
      -nginx-configmaps=$(POD_NAMESPACE)/nginx-config
      -nginx-reload-timeout=60000
      -weight-changes-dynamic-reload=true
      -enable-oidc
    State:          Running
      Started:      Wed, 10 Jul 2024 12:13:21 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:  nginx-ingress (v1:metadata.namespace)
      POD_NAME:       nginx-ingress-78c84496bb-n48bl (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t65c5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-t65c5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  9m28s  default-scheduler  Successfully assigned nginx-ingress/nginx-ingress-78c84496bb-n48bl to k3d-gitops-server-0
  Normal  Pulled     9m28s  kubelet            Container image "private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips-custom-log" already present on machine
  Normal  Created    9m28s  kubelet            Created container nginx-plus-ingress
  Normal  Started    9m28s  kubelet            Started container nginx-plus-ingress
NGINX Ingress Controller Version=3.6.1 Commit=aec5debf08c140a8d5d97f3fc596061aa756e9b0 Date=2024-07-04T08:41:26Z DirtyState=false Arch=linux/arm64 Go=go1.22.5
I0710 13:02:24.193447       1 flags.go:321] Starting with flags: ["-nginx-plus" "-nginx-configmaps=nginx-ingress/nginx-config" "-nginx-reload-timeout=60000" "-weight-changes-dynamic-reload=true"]
I0710 13:02:24.200757       1 main.go:292] Kubernetes version: 1.28.8
I0710 13:02:24.231366       1 main.go:437] Using nginx version: nginx/1.25.5 (nginx-plus-r32)
I0710 13:02:24.242747       1 main.go:868] Pod label updated: nginx-ingress-5c45cd989b-svlvp
2024/07/10 13:02:24 [notice] 19#19: using the "epoll" event method
2024/07/10 13:02:24 [notice] 19#19: OpenSSL FIPS Mode is enabled
2024/07/10 13:02:24 [notice] 19#19: nginx/1.25.5 (nginx-plus-r32)
2024/07/10 13:02:24 [notice] 19#19: built by gcc 13.2.1 20231014 (Alpine 13.2.1_git20231014) 
2024/07/10 13:02:24 [notice] 19#19: OS: Linux 6.6.31-linuxkit
2024/07/10 13:02:24 [notice] 19#19: getrlimit(RLIMIT_NOFILE): 1048576:1048576
alnhk commented 1 week ago

I was not able to reproduce the error. I built nginx-plus-ingress:3.6.1-alpine-fips-custom-log from our official image private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips with the log changes, and it runs fine on my local cluster. @alnhk Did the unmodified official image work for you, or did it display the same error message as well?

Name:             nginx-ingress-78c84496bb-n48bl
Namespace:        nginx-ingress
Priority:         0
Service Account:  nginx-ingress
Node:             k3d-gitops-server-0/172.18.0.3
Start Time:       Wed, 10 Jul 2024 12:13:20 +0100
Labels:           app=nginx-ingress
                  app.kubernetes.io/name=nginx-ingress
                  app.kubernetes.io/version=3.6.1
                  app.nginx.org/version=1.25.5-nginx-plus-r32
                  pod-template-hash=78c84496bb
Annotations:      prometheus.io/port: 9113
                  prometheus.io/scheme: http
                  prometheus.io/scrape: true
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.42.0.182
IPs:
  IP:           10.42.0.182
Controlled By:  ReplicaSet/nginx-ingress-78c84496bb
Containers:
  nginx-plus-ingress:
    Container ID:  containerd://97da4384bfcdac52af09de58188cd11432031244a8c5f8d0828dd94f7ac72552
    Image:         private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips-custom-log
    Image ID:      sha256:668e91fce4ab70509ba1f20e4aeedd5d5c39c3679333df49782bc158b8edf945
    Ports:         80/TCP, 443/TCP, 8081/TCP, 9113/TCP, 9114/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      -nginx-plus
      -nginx-configmaps=$(POD_NAMESPACE)/nginx-config
      -nginx-reload-timeout=60000
      -weight-changes-dynamic-reload=true
      -enable-oidc
    State:          Running
      Started:      Wed, 10 Jul 2024 12:13:21 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:  nginx-ingress (v1:metadata.namespace)
      POD_NAME:       nginx-ingress-78c84496bb-n48bl (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t65c5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-t65c5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  9m28s  default-scheduler  Successfully assigned nginx-ingress/nginx-ingress-78c84496bb-n48bl to k3d-gitops-server-0
  Normal  Pulled     9m28s  kubelet            Container image "private-registry.nginx.com/nginx-ic/nginx-plus-ingress:3.6.1-alpine-fips-custom-log" already present on machine
  Normal  Created    9m28s  kubelet            Created container nginx-plus-ingress
  Normal  Started    9m28s  kubelet            Started container nginx-plus-ingress
NGINX Ingress Controller Version=3.6.1 Commit=aec5debf08c140a8d5d97f3fc596061aa756e9b0 Date=2024-07-04T08:41:26Z DirtyState=false Arch=linux/arm64 Go=go1.22.5
I0710 13:02:24.193447       1 flags.go:321] Starting with flags: ["-nginx-plus" "-nginx-configmaps=nginx-ingress/nginx-config" "-nginx-reload-timeout=60000" "-weight-changes-dynamic-reload=true"]
I0710 13:02:24.200757       1 main.go:292] Kubernetes version: 1.28.8
I0710 13:02:24.231366       1 main.go:437] Using nginx version: nginx/1.25.5 (nginx-plus-r32)
I0710 13:02:24.242747       1 main.go:868] Pod label updated: nginx-ingress-5c45cd989b-svlvp
2024/07/10 13:02:24 [notice] 19#19: using the "epoll" event method
2024/07/10 13:02:24 [notice] 19#19: OpenSSL FIPS Mode is enabled
2024/07/10 13:02:24 [notice] 19#19: nginx/1.25.5 (nginx-plus-r32)
2024/07/10 13:02:24 [notice] 19#19: built by gcc 13.2.1 20231014 (Alpine 13.2.1_git20231014) 
2024/07/10 13:02:24 [notice] 19#19: OS: Linux 6.6.31-linuxkit
2024/07/10 13:02:24 [notice] 19#19: getrlimit(RLIMIT_NOFILE): 1048576:1048576

Hello @haywoodsh : Did the unmodified official image work for you - tried this and this is what i am getting this below result, anyways, will wait for the confirmation as per @brianehlert .. https://github.com/nginxinc/kubernetes-ingress/issues/5981#issuecomment-2217743753

vepatel commented 1 week ago

@alnhk yes this means the published official one is working as expected, which you can use if that works for you 👍🏼 @haywoodsh also had the modified one working as you see in the log above. https://github.com/openssl/openssl/issues/24826#issuecomment-2220416732

alnhk commented 5 days ago

@vepatel @haywoodsh : After spending some time to research "how and why" its working for you and "not working" for us. We observe that after removing the last line USER 101, it works perfectly. And after its deployed, we could see that user "101" is being used in the k8s pod.

⎈|example.com:acme-example-com)]# k exec -it nginx-ingress-dev-controller-64cdd8d858-nhmkp -- whoami
101
vepatel commented 5 days ago

Interesting, @reddyblokesh let us know if above works for you, we'll keep this open for visibility until openssl issue is resolved @shaun-nx