kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.26k stars 8.2k forks source link

Pod IP Address/Name is not visible into ingress-nginx access-logs #11914

Closed Pilotindream closed 1 week ago

Pilotindream commented 2 weeks ago

What happened:

Hello guys! My team is running nginx-ingress in EKS cluster of version 1.11.2 (chart version 4.11.2). Sometimes we receive 5xx errors from services and we would like to know whether it`s possible to get an IP Address/Name of Pod that responded with this code? Below you can see our controller config, especially for log-format-upstream:

controller:
      allowSnippetAnnotations: true
      nginxDebug: true
      logLevel: 3
      ingressClassResource:
        name: public-nginx
        controllerValue: "k8s.io/public-ingress-nginx"
      config:
        use-forwarded-headers: true
        log-format-escape-json: true
        enable-brotli: true
        hide-headers: "x-envoy-decorator-operation"
        log-format-upstream: '{"msec": "$msec", "connection": "$connection",
                "connection_requests": "$connection_requests", "pid": "$pid",
                "request_id": "$request_id", "request_length": "$request_length",
                "remote_addr": "$remote_addr", "remote_user": "$remote_user",
                "remote_port": "$remote_port", "time_local": "$time_local",
                "time_iso8601": "$time_iso8601", "request": "$request", "request_uri":
                "$request_uri", "args": "$args", "status": "$status", "body_bytes_sent":
                "$body_bytes_sent",  "bytes_sent": "$bytes_sent", "http_referer":
                "$http_referer", "http_user_agent": "$http_user_agent",
                "http_x_forwarded_for": "$http_x_forwarded_for", "http_host":
                "$http_host", "server_name": "$server_name", "request_time":
                "$request_time", "upstream": "$upstream_addr", "upstream_connect_time":
                "$upstream_connect_time", "upstream_header_time": "$upstream_header_time",
                "upstream_response_time": "$upstream_response_time",
                "upstream_response_length": "$upstream_response_length",
                "upstream_cache_status": "$upstream_cache_status", "ssl_protocol":
                "$ssl_protocol", "ssl_cipher": "$ssl_cipher", "scheme": "$scheme",
                "request_method": "$request_method", "server_protocol":
                "$server_protocol", "pipe": "$pipe", "gzip_ratio": "$gzip_ratio",
                "http_cf_ray": "$http_cf_ray", "http_cf_ipcountry": "$http_cf_ipcountry"}'

When it comes to 2xx successful response codes, I am able to see in logs of ingress-nginx below log where in "upstream" section the pod ip adress is vasisble. However, when I expressly broke some service so that it start throw 503 response, I see in logs that my "upstream" section is empty: {"msec": "1724931972.946", "connection": "92169", "connection_requests": "3", "pid": "25", "request_id": "dee825cceb66df3f63fee9edb7ab9879", "request_length": "849", "remote_addr": "116.203.159.143", "remote_user": "", "remote_port": "", "time_local": "29/Aug/2024:11:46:12 +0000", "time_iso8601": "2024-08-29T11:46:12+00:00", "request": "GET / HTTP/1.1", "request_uri": "/", "args": "", "status": "503", "body_bytes_sent": "592", "bytes_sent": "747", "http_referer": "", "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 OPR/112.0.0.0", "http_x_forwarded_for": "116.203.159.143", "http_host": "[kyverno.demo.k8sbrain.com](http://kyverno.demo.k8sbrain.com/)", "server_name": "[kyverno.demo.k8sbrain.com](http://kyverno.demo.k8sbrain.com/)", "request_time": "0.000", "upstream": "", "upstream_connect_time": "", "upstream_header_time": "", "upstream_response_time": "", "upstream_response_length": "", "upstream_cache_status": "", "ssl_protocol": "", "ssl_cipher": "", "scheme": "http", "request_method": "GET", "server_protocol": "HTTP/1.1", "pipe": ".", "gzip_ratio": "", "http_cf_ray": "", "http_cf_ipcountry": ""}

What I already tried: I have already enabled error-log-level: "debug" as well as: controller.nginxDebug = true or false controller.loglevel = 1 to 3 value Found documentation here https://docs.nginx.com/nginx-ingress-controller/troubleshooting/troubleshoot-common/

However, I still can`t see the pod IP address/Name that responded with 503. After enabling these both error-log-level and nginxDebug I only started to receive some messages related to nginx work itself (some messages from lua). Example of logs from debug mode, related to my service that I intentionally broke to receive 503 response:

2024/08/29 12:34:16 [debug] 26#26: *1096 http script var: "kyverno.demo.k8sbrain.com"
2024/08/29 12:34:16 [debug] 26#26: *1096 http script set $best_http_host
2024/08/29 12:34:16 [debug] 26#26: *1096 http script complex value
2024/08/29 12:34:16 [debug] 26#26: *1096 http script var: "80"
2024/08/29 12:34:16 [debug] 26#26: *1096 http script set $pass_port
2024/08/29 12:34:16 [debug] 26#26: *1096 http script value: ""
2024/08/29 12:34:16 [debug] 26#26: *1096 http script set $proxy_alternative_upstream_name
2024/08/29 12:34:16 [debug] 26#26: *1096 rewrite phase: 4
2024/08/29 12:34:16 [debug] 26#26: *1096 lua rewrite handler, uri:"/" c:1
2024/08/29 12:34:16 [debug] 26#26: *1096 code cache lookup (key='rewrite_by_lua_nhli_0b5957293aadfec043f3de317a2f03cb', ref=8)
2024/08/29 12:34:16 [debug] 26#26: *1096 code cache hit (key='rewrite_by_lua_nhli_0b5957293aadfec043f3de317a2f03cb', ref=8)
2024/08/29 12:34:16 [debug] 26#26: *1096 lua creating new thread
2024/08/29 12:34:16 [debug] 26#26: *1096 lua reusing cached lua thread 00000515BDB1CBA0 (ref 2)
2024/08/29 12:34:16 [debug] 26#26: *1096 lua reset ctx
2024/08/29 12:34:16 [debug] 26#26: *1096 http lua finalize threads
2024/08/29 12:34:16 [debug] 26#26: *1096 add cleanup: 0000FFFF8D70D0D0
2024/08/29 12:34:16 [debug] 26#26: *1096 lua run thread, top:0 c:1
2024/08/29 12:34:16 [debug] 26#26: *1096 add cleanup: 0000FFFF8D70D1B0
2024/08/29 12:34:16 [debug] 26#26: *1096 lua exit with code 503

So my question: Is it possible to configure ingress-nginx somehow to start seeing a Pod IP Adress/Name that responded with 5xx response? Thanks in advance for your help!

What you expected to happen: I expected to see pod ip address that responded with 5xx error into "upstream" section.

How ingress-nginx installed: Installed via community Helm Chart (version: 4.11.2)

ingress-nginx version: 1.11.2

k8s-ci-robot commented 2 weeks ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
strongjz commented 2 weeks ago

Are there endpoints for this service you're trying to access? Is it the same service that is giving 200, and randomly, you're getting 503?

/triage needs-information

Pilotindream commented 2 weeks ago

Are there endpoints for this service you're trying to access? Is it the same service that is giving 200, and randomly, you're getting 503?

/triage needs-information

No, this service that randomly respond with 5xx is on our production cluster. And developers complaints that they can not see pod ip that responded with 5xx response. In my qustion I give example of kyverno “dummy” service that I intentionally broke so that it respond with 5xx. I made this to ensure that 503 response is appeared in nginx-ingress logs but the upstream field is empty even with enabled error-log: debug and nginxDebug.

So in general I am looking for a way to see ip address of pod that respond with all 5xx in my ingress-nginx logs.

longwuyuan commented 2 weeks ago

/remove-kind bug /kind support

The controller or the default-backend or the custom-backend will return 503

Pilotindream commented 2 weeks ago

/remove-kind bug /kind support

The controller or the default-backend or the custom-backend will return 503

So am I right that when i see in ingress-nginx pod, logs with 503 response but upstream field is empty, it means that controller or the default-backend or the custom-backend will respons with 5xx?

Maybe you can suggest a way how to test a case when exactly pod will respond with 5xx error so that I will be sure that in access-logs, upstream field show pod ip address?

longwuyuan commented 2 weeks ago

I did this ;

A backend pod may respond with 503 when webserver in backend-pod has a problem in the internal-server for the application

Pilotindream commented 2 weeks ago

@longwuyuan, so have you tried this approach and you was able to see 503 error and the “upstream” field was not empty so that you was able to see your backend pod ip? Since I tried to scale to 0 some of my deployment, and when I curl it via ingress, I see 503 in controller pod logs, but “upstream” field is empty:

"msec": "1724931972.946", "connection": "92169", "connection_requests": "3", "pid": "25", "request_id": "dee825cceb66df3f63fee9edb7ab9879", "request_length": "849", "remote_addr": "116.203.159.143", "remote_user": "", "remote_port": "", "time_local": "29/Aug/2024:11:46:12 +0000", "time_iso8601": "2024-08-29T11:46:12+00:00", "request": "GET / HTTP/1.1", "request_uri": "/", "args": "", "status": "503", "body_bytes_sent": "592", "bytes_sent": "747", "http_referer": "", "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 OPR/112.0.0.0", "http_x_forwarded_for": "116.203.159.143", "http_host": "kyverno.demo.k8sbrain.com", "server_name": "kyverno.demo.k8sbrain.com", "request_time": "0.000", "upstream": "", "upstream_connect_time": "", "upstream_header_time": "", "upstream_response_time": "", "upstream_response_length": "", "upstream_cache_status": "", "ssl_protocol": "", "ssl_cipher": "", "scheme": "http", "request_method": "GET", "server_protocol": "HTTP/1.1", "pipe": ".", "gzip_ratio": "", "http_cf_ray": "", "http_cf_ipcountry": ""}

Also, I tried to set some invalid path in my livenessProbe as well as specify invalid image but situation was same- “upstream” is empty

longwuyuan commented 2 weeks ago

I understand your requirement and its possible to make some guesses as to your goal.

But your comments don't show your ack of the obvious. For example, its obvious in my test that the 503 was returned by the controller pod. So it should be obvious that normal logging configuration will not print the ipaddress of the controller-pod itself in its own logs.

The controller logging is following nginx standards. Since your comments are hinting that you are very very particular about a backend workload pod sending a 503 back to the client, via the reverseproxy in the controller, you should design your own tests and configs on the backend pod. Typically you can brute force all responses from a backend-pod to be 503, just as a experimental test.

There is not much to troubleshoot or fix/change in the controller code here. Let me know if you have data showing otherwise and then more discussions can occur based on the data you provide.

If you have a tomcat inside a backend pod, frontended by nginx inside the same pod, and you inject fault in the tomcat while the nginx in the pod is working, that results in a internal server server, which typically is a response code 5xx

Pilotindream commented 2 weeks ago

@longwuyuan, thx for replies! I have last question. In case I see in controller logs that “$upstream” value is empty in access logs, does it mean that it is response from controller and the request was not sent to beckend service?

longwuyuan commented 2 weeks ago

That is what my opinion is. But let us not rely on my opinion. You can check the technical definition and description of the variable you have configured for display in the logs. In the sense the if that variable is coded to get the ipaddress of the controller-pod itself, or if the logging process is configured and coded to print its own exgernal-facing interface ipaddress, when the variable $upstream is interpolated, then you can get proof. (but I am not a developer or even vaguely understand such deep code of nginx but I wonder how $upstream variable can interpolate self/own ipaddress)

longwuyuan commented 2 weeks ago

Correction, on this

$upstream” value is empty in access logs, does it mean that it is response from controller and the request was not sent to beckend service?

The controller has to attempt to reverseproxy and forward the request to the backend. Otherwise the controller and particularly the reverseproixy nginx in the controller can not arbitrarily decide 503 on its own. If there is no endpoint available to send/forward the request to, only then the controller can conclude that a error-response is required to be returned to the client.

longwuyuan commented 2 weeks ago

Please close the issue, if it is resolved.

ZJfans commented 2 weeks ago

You should check whether flow control is set. Usually, limit-related functions will return 503. For example, if a certain URI is limited to 5 accesses per minute, nginx will return 503 for the 6th time. And the request will not be routed to the backend, so there is no upstream_addr and upstream_response_time. You can check error.log to see if there is 503 information at 29/Aug/2024:11:46:12. The log level should be warning.

longwuyuan commented 1 week ago

@Pilotindream I will close this issue for now as it is only adding to the tally of open issues without actually tracking any action item of the project. thanks

/close

k8s-ci-robot commented 1 week ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/11914#issuecomment-2334992280): >@Pilotindream I will close this issue for now as it is only adding to the tally of open issues without actually tracking any action item of the project. thanks > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.