kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.36k stars 8.23k forks source link

Support `grpc_read_timeout` and `grpc_send_timeout` in annotations #11250

Closed Anddd7 closed 4 months ago

Anddd7 commented 6 months ago

What do you want to happen?

We want to configure the grpc_read_timeout and grpc_send_timeout with annotations on ingress.

Since the server-snippet has some potential risk, we have to migrate all snippets to common annotations. https://github.com/search?q=repo%3Akubernetes%2Fingress-nginx+path%3A%2F%5Einternal%5C%2Fingress%5C%2Fannotations%5C%2F%2F+parser.AnnotationRiskCritical&type=code

Is there currently another issue associated with this?

https://github.com/kubernetes/ingress-nginx/issues/2475, but it's inactive and closed.

Does it require a particular kubernetes version?

No

Solutions

1. Add new annotations, e.g.

...
annotations:
  nginx.ingress.kubernetes.io/grpc_read_timeout: 3600
  nginx.ingress.kubernetes.io/grpc_send_timeout: 3600
...

2. Set grpc_read_timeout and grpc_send_timeout with the same value in proxy_read_timeout

Similar like nginx-ingress-controller, just set grpc with proxy settings by default

Anddd7 commented 6 months ago

i'd like to contribute if it's ready to implement

longwuyuan commented 6 months ago

/triage accepted

longwuyuan commented 6 months ago

/assign

Anddd7 commented 6 months ago

@longwuyuan yes, i have read the docs and read to create a PR.

I refer to exsiting proxy settings to create 3 annotations for grpc timeout, please take a look, https://github.com/kubernetes/ingress-nginx/pull/11258

I'll refine the documentation later.

the upstream nginx is already support this, https://nginx.org/en/docs/http/ngx_http_grpc_module.html#grpc_connect_timeout.

strongjz commented 5 months ago

/priority backlog

github-actions[bot] commented 4 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

flphvlck commented 1 month ago

Hi @strongjz , maybe this should be noted as breaking change in release notes for 1.11 Because we are using:

      nginx.ingress.kubernetes.io/configuration-snippet: |
        grpc_next_upstream_tries X;
        grpc_read_timeout Xs;
        grpc_send_timeout Xs;

and after upgrade from 1.10 to 1.11 we ended up with these errors:

Error: exit status 1
2024/08/19 12:10:49 [emerg] 60#60: "grpc_read_timeout" directive is duplicate in /tmp/nginx/nginx-cfg658058193:2367
nginx: [emerg] "grpc_read_timeout" directive is duplicate in /tmp/nginx/nginx-cfg658058193:2367
nginx: configuration file /tmp/nginx/nginx-cfg658058193 test failed

In the /tmp/nginx/nginx-cfg658058193 there really are duplicated directives for grpc (default values from annotations and our values from configuration-snippet

            # Grpc settings
            grpc_connect_timeout                    5s;
            grpc_send_timeout                       60s;
            grpc_read_timeout                       60s;

            grpc_next_upstream error timeout http_502 http_503 http_504 non_idempotent;
            grpc_next_upstream_tries X;
            grpc_read_timeout Xs;
            grpc_send_timeout Xs;

We will switch from snippet to annotations to fix this, but I think it's better to know about this from release notes.

Anddd7 commented 1 month ago

So we may need a 'merge' mechanism to ensure the annotations and snippets can work together, if there is a duplicate config. Then, we can migrate the snippets to annotations. (if we create more annotations)

flphvlck commented 1 month ago

Yeah, that's another option how to deal with it (except "breaking change" note) I'm in the beggining of the upgrade process in our clusters and I will investigate how big problem is this for us. But it will deffinitely make the whole process harder

Possible merge mechanism - just don't apply default values from annotations when GRPC is used?

Anddd7 commented 1 month ago

Yeah, that's another option how to deal with it (except "breaking change" note) I'm in the beggining of the upgrade process in our clusters and I will investigate how big problem is this for us. But it will deffinitely make the whole process harder

Possible merge mechanism - just don't apply default values from annotations when GRPC is used?

ok, the solution 1 would be safer -- add new annotations, it doesn't break the existing "configurations", you can add new annotations and remove snippets at the same time.

how do you think? @tao12345666333 @longwuyuan

sepich commented 1 month ago

Sorry guys, but that is a really breaking change not mentioned in any changelogs. Could you please at least add this to Release and https://github.com/kubernetes/ingress-nginx/tree/main/changelog ? What happens on upgrade now is that Ingress completely stops serving any sites, as it has some typo in some single config. Is it possible to have nginx_ingress_controller_config_last_reload_successful as a readinessCheck, so that new version with broken config cannot even be rolled out?

flphvlck commented 1 month ago

@sepich it's because of duplicities in nginx.conf - in case you set grpc timeouts in nginx.ingress.kubernetes.io/configuration-snippet

sepich commented 1 month ago

Thank you, I know why it happend. Question is how should I know it before the upgrade? No any mentions to "check all your snippets or everything would be down" in changelogs.

flphvlck commented 1 month ago

I raised issue #11866

flphvlck commented 1 month ago

ok, the solution 1 would be safer -- add new annotations, it doesn't break the existing "configurations", you can add new annotations and remove snippets at the same time.

@Anddd7 that would be absolutely marvellous

jon-rei commented 2 weeks ago

We are in the process of upgrading our clusters to the latest version and have encountered this problem. For each cluster we have about 50 ingress resources that have grpc_read_timeout configured as a snippet.

Is there a recommended strategy for handling the upgrade to 1.11 for a large number of ingress resources?

flphvlck commented 2 weeks ago

@jon-rei we are also in the process of upgrading I have to change settings for the 3 affected options manually in each ingress resource which uses GRPC

worse thing is that the default for grpc_connect_timeout was changed from 60s to 5s - as I understood this change it's not unusual that you don't even know that you rely on default value