Envoy x-envoy-upstream-rq-timeout-ms header on individual request being stripped out

jaygorrell commented 5 years ago

Bug description Making requests to a service with no VirtualService (or with a VirtualService with no timeout configured) that includes x-envoy-upstream-rq-timeout-ms does not take effect. The service responds successfully after its normal amount of time.

This should work, per https://istio.io/docs/tasks/traffic-management/request-timeouts/

One more thing to note about timeouts in Istio is that in addition to overriding them in route rules, as you did in this task, they can also be overridden on a per-request basis if the application adds an x-envoy-upstream-rq-timeout-ms header on outbound requests. In the header, the timeout is specified in milliseconds instead of seconds.

Relevant cluster dump info for the route

"route": {
  "cluster": "outbound|80||route-happy-service.default.svc.cluster.local",
  "timeout": "0s",
  "retry_policy": {
    "retry_on": "connect-failure,refused-stream,unavailable,cancelled,resource-exhausted,retriable-status-codes",
    "num_retries": 2,
    "retry_host_predicate": [
    {
      "name": "envoy.retry_host_predicates.previous_hosts"
    }
    ],
    "host_selection_retry_max_attempts": "3",
    "retriable_status_codes": [
      503
    ]
  },
  "max_grpc_timeout": "0s"
},

Relevant debug logs that show two extra headers I sent (x-my-test and x-envoy-upstream-rq-timeout-ms) were received differently:

[2019-06-17 17:10:29.394][27][debug][router] [external/envoy/source/common/router/router.cc:381] [C32248][S5641947549702591261] router decoding headers:
':authority', 'route-happy-service.default'
':path', '/uta'
':method', 'POST'
':scheme', 'https'
'user-agent', 'curl/7.52.1'
'accept', '*/*'
'content-type', 'application/json'
'x-my-test', 'true'
'content-length', '1019'
'x-forwarded-proto', 'http'
'x-request-id', 'a7a9cd5b-66e2-438a-8436-ab62ae2c1af4'
'x-envoy-decorator-operation', 'route-happy-service.default.svc.cluster.local:80/*'
'x-istio-attributes', 'CkkKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEi4SLGlzdGlvOi8vZGVmYXVsdC9zZXJ2aWNlcy9yb3V0ZS1oYXBweS1zZXJ2aWNlCksKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBIvEi1yb3V0ZS1oYXBweS1zZXJ2aWNlLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdAoxChhkZXN0aW5hdGlvbi5zZXJ2aWNlLm5hbWUSFRITcm91dGUtaGFwcHktc2VydmljZQpJCgpzb3VyY2UudWlkEjsSOWt1YmVybmV0ZXM6Ly9yb3V0ZS1oYXBweS1zZXJ2aWNlLTU3OTY5ODU1NmMtbHJoOXYuZGVmYXVsdA=='
'x-b3-traceid', '9518ddd2fc589ea0d881425e569c43e7'
'x-b3-spanid', 'd881425e569c43e7'
'x-b3-sampled', '0'

[2019-06-17 17:10:29.394][27][debug][router] [external/envoy/source/common/router/router.cc:381] [C32248][S5641947549702591261] router decoding headers:
':authority', 'route-happy-service.default'
':path', '/uta'
':method', 'POST'
':scheme', 'https'
'user-agent', 'curl/7.52.1'
'accept', '*/*'
'content-type', 'application/json'
'x-my-test', 'true'
'content-length', '1019'
'x-forwarded-proto', 'http'
'x-request-id', 'a7a9cd5b-66e2-438a-8436-ab62ae2c1af4'
'x-envoy-decorator-operation', 'route-happy-service.default.svc.cluster.local:80/*'
'x-istio-attributes', 'CkkKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEi4SLGlzdGlvOi8vZGVmYXVsdC9zZXJ2aWNlcy9yb3V0ZS1oYXBweS1zZXJ2aWNlCksKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBIvEi1yb3V0ZS1oYXBweS1zZXJ2aWNlLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdAoxChhkZXN0aW5hdGlvbi5zZXJ2aWNlLm5hbWUSFRITcm91dGUtaGFwcHktc2VydmljZQpJCgpzb3VyY2UudWlkEjsSOWt1YmVybmV0ZXM6Ly9yb3V0ZS1oYXBweS1zZXJ2aWNlLTU3OTY5ODU1NmMtbHJoOXYuZGVmYXVsdA=='
'x-b3-traceid', '9518ddd2fc589ea0d881425e569c43e7'
'x-b3-spanid', 'd881425e569c43e7'
'x-b3-sampled', '0'

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure [ X ] Docs [ ] Installation [ X ] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastrcture

Expected behavior A request to a service with x-envoy-upstream-rq-timeout-ms should timeout after the desired amount of time, especially when a VirtualService is not setting a value itself.

Steps to reproduce the bug

Deploy a service that takes at least a second to respond.
Make a request with x-envoy-upstream-rq-timeout-ms: 100.
See that the service responds normally after it's normal amount of time.

Version (include the output of istioctl version --remote and kubectl version) istio 1.1.4 / k8s 1.11.8

How was Istio installed? Helm w/mTLS

rcaballeromx commented 5 years ago

@jaygorrell Could there be an application level conflict causing the issue? Please check our updated docs on the topic: https://istio.io/docs/concepts/traffic-management/#compatibility-with-application-level-fault-handling

jaygorrell commented 5 years ago

@rcaballeromx Wouldn't that only apply to things like retries where it compounds the value? In the case of a timeout, it doesn't matter if both are handling it since it's a short-circuit.

Either way, no application fault handling here, and having a 100ms timeout allow a 30+ second response come through is a good proof.

bageljp commented 5 years ago

I have exact same problems. If it is via istio-ilbgateway , this header will be overwritten and it will take effect. But, It does not overwrite and take effect via mesh, and the x-envoy-upstream-rq-timeout-ms header is not output to the log. Then it occurs not only x-envoy-upstream-rq-timeout-ms but also x-envoy-max-retries header. Is there any update on this issue?

Version (include the output of istioctl version --remote and kubectl version) istio 1.2.2 / k8s v1.13.6-gke.13

How was Istio installed? Helm template w/mTLS

incfly commented 4 years ago

@istio/wg-networking-maintainers does this sound a real issue or intended behavior? Note the mentioned release is 1.1.

incfly commented 4 years ago

Re-open per https://github.com/istio/istio/issues/19111

Dev25 commented 4 years ago

As per a recent post on discuss.istio.io it looks like this is now protected by a pilot env flag PILOT_SIDECAR_USE_REMOTE_ADDRESS and was changed for Istio 1.2+

I've confirmed that setting that flag now allows workloads to tweak envoy behaviour via headers.

Dev25 commented 4 years ago

Also FWIW enabling PILOT_SIDECAR_USE_REMOTE_ADDRESS does cause other side affects then if you just wanted the ability to use request header overrides from inside the mesh

e.g. X-Forwarded-Proto https://github.com/istio/istio/issues/15124 https://github.com/istio/istio/issues/7964

gustavovalverde commented 4 years ago

@Dev25 @jaygorrell did you guys finally managed to solve this?

With this setup: nginx-ingress --> ingress-gateway --> VirtualService --> Pod I'm always getting an 15s timeout which I don't find a way to change.

This didn't happened without the nginx-ingress. Seems like the additional hops cause this headers to be stripped out.

bageljp commented 4 years ago

The x-envoy-upstream-rq-timeout-ms header appears to work between containers if you use respect_expected_rq_timeout of envoy. https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/http/router/v2/router.proto

In istio it seems to use EnvoyFilter for this setting, but I don't know how to set this up.

istio / istio

Envoy x-envoy-upstream-rq-timeout-ms header on individual request being stripped out #14923