istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.88k stars 7.74k forks source link

Inflight TCP Connections to downstream closed when updating istio configmap #53539

Open Ajay-Ravichandran opened 5 days ago

Ajay-Ravichandran commented 5 days ago

Is this the right place to submit this?

Bug Description

We are moving to JSON based access logs. We have istio proxy sidecar injected to all our containers. When applying the below configmap, we are able to see inflight TCP connections to external servers getting closed - momentarily for few seconds. Post which, things work fine when new connections are created.

apiVersion: v1
data:
  mesh: |-
    accessLogFile: /dev/stdout
    accessLogEncoding: JSON
    accessLogFormat: |
      {           "i_start_time": "%START_TIME%",
                  "i_method": "%REQ(:METHOD)%",
                  "i_path": "%REQ(x-request-path?:PATH)%",
                  "i_protocol": "%PROTOCOL%",
                  "i_response_code": "%RESPONSE_CODE%",
                  "i_response_flags": "%RESPONSE_FLAGS%",
                  "i_response_code_details": "%RESPONSE_CODE_DETAILS%",
                  "i_connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%",
                  "i_upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
                  "i_bytes_received": "%BYTES_RECEIVED%",
                  "i_bytes_sent": "%BYTES_SENT%",
                  "i_upstream_bytes_received": "%UPSTREAM_WIRE_BYTES_RECEIVED%",
                  "i_upstream_bytes_sent": "%UPSTREAM_WIRE_BYTES_SENT%",
                  "i_duration": "%DURATION%",
                  "i_upstream_duration": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
                  "i_x_forwarded_for": "%REQ(X-FORWARDED-FOR)%",
                  "i_user_agent": "%REQ(USER-AGENT)%",
                  "i_x_request_id": "%REQ(X-REQUEST-ID)%",
                  "i_x_trace_id": "%REQ(TRACEPARENT)%",
                  "i_authority": "%REQ(:AUTHORITY)%",
                  "i_upstream_host": "%UPSTREAM_HOST%",
                  "i_upstream_cluster": "%UPSTREAM_CLUSTER%",
                  "i_upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
                  "i_route_name": "%ROUTE_NAME%",
                  "i_downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
                  "i_downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
                  "i_requested_server_name": "%REQUESTED_SERVER_NAME%",
                  "i_retry_count": "%UPSTREAM_REQUEST_ATTEMPT_COUNT%",
      }
    defaultConfig:
      discoveryAddress: istiod.istio-system.svc:15012
      proxyMetadata: {}
      tracing:
        zipkin:
          address: zipkin.istio-system:9411
    enablePrometheusMerge: true
    rootNamespace: istio-system
    trustDomain: cluster.local
  meshNetworks: 'networks: {}'
kind: ConfigMap
metadata:
  labels:
    install.operator.istio.io/owning-resource: unknown
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    release: istio
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.16.1
  name: istio
  namespace: istio-system

On checking the debug logs in the proxy, could see that envoy connections are getting closed

2024-10-15T06:14:04.709343Z debug   envoy connection    [C27205] closing data_to_write=0 type=0
2024-10-15T06:14:04.709347Z debug   envoy connection    [C27205] closing socket: 1
2024-10-15T06:14:04.709363Z debug   envoy pool  [C27205] client disconnected, failure reason: 
2024-10-15T06:14:04.709373Z debug   envoy pool  invoking idle callbacks - is_draining_for_deletion_=false
2024-10-15T06:14:04.709381Z debug   envoy pool  [C27205] destroying stream: 0 remaining

Looks like graceful termination doesn't seem to happen momentarily when changes are pushed by istiod.

Version

Istio operator version - 1.16.1

Additional Information

No response

howardjohn commented 4 days ago

This is basically https://github.com/envoyproxy/envoy/issues/35109. tl;dr envoy will drain connections on (some) configuration updates

Ajay-Ravichandran commented 4 days ago

This is basically envoyproxy/envoy#35109. tl;dr envoy will drain connections on (some) configuration updates

Thanks for the confirmation @howardjohn. However, the issue seems to be closed due to inactivity...Any workaround?

hzxuzhonghu commented 2 days ago

It seems no one in envoy community is willing to support that maybe because the requirement is too customized, it is lack of a general way to define the influence factors