Closed omni52 closed 4 months ago
Having the actually logs and DestinationRule would be pretty useful for resolving this.
Hi @howardjohn, no problem - I searched in LOKI for the period we discovered the issue. Logs on the EW GW, were like:
2024-03-23 23:59:47.417
2024-03-23T22:59:47.417470Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:47.187
2024-03-23T22:59:47.187699Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:47.009
2024-03-23T22:59:47.009448Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
2024-03-23 23:59:34.925
2024-03-23T22:59:34.924871Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:32.424
2024-03-23T22:59:32.424731Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:32.197
2024-03-23T22:59:32.196816Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:32.023
2024-03-23T22:59:32.023010Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
2024-03-23 23:59:17.420
2024-03-23T22:59:17.420138Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:17.169
2024-03-23T22:59:17.169451Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:17.029
2024-03-23T22:59:17.029038Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
2024-03-23 23:59:14.800
2024-03-23T22:59:14.800518Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
2024-03-23 23:59:02.406
2024-03-23T22:59:02.406210Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:02.149
2024-03-23T22:59:02.148698Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:59:02.018
2024-03-23T22:59:02.018430Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
2024-03-23 23:58:47.409
2024-03-23T22:58:47.409347Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:58:47.145
2024-03-23T22:58:47.145026Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 417 successful, 0 rejected; lds updates: 0 successful, 419 rejected
2024-03-23 23:58:47.017
2024-03-23T22:58:47.017729Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 415 successful, 0 rejected; lds updates: 0 successful, 417 rejected
and sometime there pop something up like
2024-03-23T23:01:19.668119Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138 gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 0.0.0.0_15443: error adding listener '0.0.0.0:15443': filter chain '' has the same matching rules defined as ''
a killing DestinationRule is like
spec:
exportTo:
- '*'
host: '*.foobar.svc.cluster.local'
subsets:
- labels:
topology.istio.io/cluster: cluster-a
name: cluster-a
- labels:
topology.istio.io/cluster: cluster-a
name: cluster-a
Feel free to ask if you need more information. greets uli
A few more questions: you have multiple versions, is this from Istiod 1.20?
is 15443 an AUTO_PASSTHROUGH gateway? What is the Gateway spec
Hi, no Problem. The gateway is defined as you already said:
spec:
selector:
istio: eastwestgateway
servers:
- hosts:
- '*.local'
port:
name: tls
number: 15443
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
We have three istiod / pilots in a parallel setup, 1.18.6, 1.19.8 and 1.20.4 The Gateway corresponding Pod is running on 1.20.4 and connected to a istiod in this version - pilot at version 1.20.4
We use for all the official distroless images, except 1.18.6 for debugging legacy issues.
Currently we are in the upgrade process (1.19.9, 1.20.5 and 1.21.1), maybe I can retest the issue in 1.21.1 the next days. But I didn't not found something about this issue in the latest release notes so I'm quite sure the behaviour still exists.
The behaviour still exists in 1.21.1
, except the fact the pod of eastwest gateway is not hanging at restart process. It now hangs in a CrashLoopBackOff and last log lines are just some infos and warnings:
2024-04-12T05:42:12.525412Z warn Envoy proxy is NOT ready: config received from XDS server, but was rejected: cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2024-04-12T05:42:12.611462Z info Agent draining Proxy for termination
2024-04-12T05:42:12.611449Z info Status server has successfully terminated
2024-04-12T05:42:12.616383Z info Graceful termination period is 5s, starting...
2024-04-12T05:42:17.618282Z info Graceful termination period complete, terminating remaining proxies.
2024-04-12T05:42:17.618354Z warn Aborting proxy
2024-04-12T05:42:17.618508Z info Envoy aborted normally
2024-04-12T05:42:17.618519Z warn Aborted proxy instance
2024-04-12T05:42:17.618526Z info Agent has successfully terminated
The change is probably from the new startupProbe, so it terminates if it cannot start for a while (Ithink 10min)
I can reproduce this only if I delete my validation webhook. This is supposed to be rejected by validation, the subset config is illegal
I think we can close this, the issue seems to be fixed with 1.22 - thanks a lot.
Added validation checks to reject DestinationRules with duplicate subset names.
Is this the right place to submit this?
Bug Description
Environment: Istio Multicluster Setup
Issue: When operating an Istio multicluster setup and creating
destinationRules
with subsets that reference thetopology.istio.io/cluster
label, there exists a critical issue where inadvertently setting duplicate values for thetopology.istio.io/cluster
label causes the East West Gateway to crash after a period. The crash is accompanied by a cryptic error message indicating an update was "but was rejected" by the correspondingistiod
. Attempts to redeploy the East West gateway result in failure to start until the problematicDestinationRule
is removed. The root cause of this behavior is not immediately obvious, making it difficult for operators to diagnose and rectify the issue.Symptoms:
DestinationRule
to resume normal operation.envoy_lds_update_rejected
andenvoy_cds_update_rejected
metrics forjob=istio-eastwestgateway
indicate the presence of the problematic artifact.Expected Behavior: Istio should proactively detect and prevent the creation or updating of
DestinationRules
with duplicatetopology.istio.io/cluster
label values, thereby avoiding crashes and restart failures of the East West Gateway.Steps to Reproduce:
destinationRules
with subsets referencing thetopology.istio.io/cluster
label.topology.istio.io/cluster
label in these rules.Suggested Resolution: Implement validation checks at the time of
DestinationRule
creation or update to identify and prevent the use of duplicatetopology.istio.io/cluster
label values. Enhance error messaging to clearly identify the cause of rejections byistiod
related todestinationRules
configuration issues.Additional Context: This issue is critical as it not only disrupts the normal operation of the East West Gateway but also hampers the ability to quickly diagnose and resolve the configuration error. Providing a more robust validation mechanism and clearer error messages will significantly improve the operator experience and stability of Istio multicluster setups.
Version
Additional Information
No response