grpc / grpc-node

gRPC for Node.js
https://grpc.io
Apache License 2.0
4.43k stars 640 forks source link

gRPC deadline when hitting a server using ring_hash LB #2807

Closed ABujalance closed 2 days ago

ABujalance commented 1 month ago

Problem description

After 1 or 2 successful requests to a server with ring_hash load balancing, all subsequent requests for a considerable amount of time end up in a deadline error.

Reproduction steps

Sorry if I can't provide specific repositories, as this is an issue happening inside private repositories in my company. If nothing can be extracted from the logs, we can try to recreate a whole environment where this happens that can be tried by anyone.

We have a graphQL server (gqlServer) that we use to make requests to different gRPC servers within our internal network (although different clusters). There's one in particular that we always request from in every request to check for authentication and authorization details (grpcIAMServer). Up until now, we never had any issues with the connections but a recent change in the load balancing policy on the grpcIAMServer caused our gqlServer to start getting constant deadline errors from the grpcIAMServer.

This is the policy change

loadBalancing:
policy: ring_hash
ringHash:
  headers:
    - "grpcIAMServer.ring_hash"

After this change, the errors started happening. They go like this:

After we reverted the ring_hash change, the gqlServer started working normally again. This is not reproducible in grpc-go and that's why I'm coming here with the issue.

Environment

Additional context

All grpc requests to grpcIAMServer are done using a client package, so they are a bit different to the ones done to the rest of grpc servers that we fetch information from.

In comments below you can see the logs in the gqlServer for the first, second and third request that we are sending to the grpcIAMServer (I removed the IPs or other information that could be sensible)

ABujalance commented 1 month ago

Server startup and first request:

D 2024-07-30T07:32:55.340Z | index | Loading @grpc/grpc-js version 1.9.4
D 2024-07-30T07:32:55.367Z | index | Loading @grpc/grpc-js version 1.9.4
Enabling inline tracing for this subgraph. To disable, use ApolloServerPluginInlineTraceDisabled.
🚀 Server ready at http://localhost:4000/graphql
D 2024-07-30T07:32:56.456Z | resolving_load_balancer | dns:otel-collector.otel:4317 IDLE -> IDLE
D 2024-07-30T07:32:56.456Z | connectivity_state | (1) dns:otel-collector.otel:4317 IDLE -> IDLE
D 2024-07-30T07:32:56.457Z | dns_resolver | Resolver constructed for target dns:otel-collector.otel:4317
D 2024-07-30T07:32:56.464Z | channel | (1) dns:otel-collector.otel:4317 Channel constructed with options {
  "grpc.default_compression_algorithm": 0
}
D 2024-07-30T07:32:56.464Z | channel_stacktrace | (1) Channel constructed 
    at new InternalChannel (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/node_modules/@grpc/grpc-js/build/src/internal-channel.js:237:23)
    at new ChannelImplementation (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/node_modules/@grpc/grpc-js/build/src/channel.js:35:32)
    at new Client (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/node_modules/@grpc/grpc-js/build/src/client.js:65:36)
    at new ServiceClientImpl (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/node_modules/@grpc/grpc-js/build/src/make-client.js:58:5)
    at onInit (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/build/src/util.js:36:28)
    at Immediate._onImmediate (/app/node_modules/@opentelemetry/otlp-grpc-exporter-base/build/src/OTLPGRPCExporterNodeBase.js:59:13)
    at process.processImmediate (node:internal/timers:483:21)
D 2024-07-30T07:33:10.521Z | v1.10.1 1 | http_filter | Registered filter with type URL type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
D 2024-07-30T07:33:10.522Z | v1.10.1 1 | http_filter | Registered filter with type URL type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
D 2024-07-30T07:33:10.524Z | v1.10.1 1 | resolving_load_balancer | xds:///grpcIAMServer:5000 IDLE -> IDLE
D 2024-07-30T07:33:10.525Z | v1.10.1 1 | connectivity_state | (1) xds:///grpcIAMServer:5000 IDLE -> IDLE
D 2024-07-30T07:33:10.526Z | v1.10.1 1 | channel | (1) xds:///grpcIAMServer:5000 Channel constructed with options {
  "grpc.keepalive_time_ms": 180000,
  "grpc.keepalive_timeout_ms": 20000,
  "grpc.keepalive_permit_without_calls": 1
}
D 2024-07-30T07:33:10.526Z | v1.10.1 1 | channel_stacktrace | (1) Channel constructed 
    at new InternalChannel (/app/node_modules/@grpc/grpc-js/build/src/internal-channel.js:248:23)
    at new ChannelImplementation (/app/node_modules/@grpc/grpc-js/build/src/channel.js:35:32)
    at new Client (/app/node_modules/@grpc/grpc-js/build/src/client.js:65:36)
    at new GrpcTransport (/app/node_modules/@protobuf-ts/grpc-transport/build/commonjs/grpc-transport.js:11:23)
    at createTransport (/app/node_modules/@utilitywarehouse/iam-node/dist/grpc/utils.js:46:16)
D 2024-07-30T07:33:11.293Z | v1.10.1 1 | load_balancing_call | [25] ended with status: code=0 details=""
D 2024-07-30T07:33:11.293Z | v1.10.1 1 | retrying_call | [24] Received status from child [25]
D 2024-07-30T07:33:11.293Z | v1.10.1 1 | retrying_call | [24] state=COMMITTED handling status with progress PROCESSED from child [25] in state ACTIVE
D 2024-07-30T07:33:11.293Z | v1.10.1 1 | retrying_call | [24] ended with status: code=0 details=""
D 2024-07-30T07:33:11.293Z | v1.10.1 1 | resolving_call | [20] Received status
D 2024-07-30T07:33:11.294Z | v1.10.1 1 | resolving_call | [20] ended with status: code=0 details=""
D 2024-07-30T07:33:11.294Z | v1.10.1 1 | subchannel_call | [32] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:11.311Z | v1.10.1 1 | subchannel_call | [35] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:11.311Z | v1.10.1 1 | load_balancing_call | [31] Received metadata
D 2024-07-30T07:33:11.311Z | v1.10.1 1 | retrying_call | [30] Received metadata from child [31]
D 2024-07-30T07:33:11.311Z | v1.10.1 1 | retrying_call | [30] Committing call [31] at index 0
D 2024-07-30T07:33:11.312Z | v1.10.1 1 | resolving_call | [23] Received metadata
D 2024-07-30T07:33:11.312Z | v1.10.1 1 | subchannel_call | [35] receive HTTP/2 data frame of length 111
D 2024-07-30T07:33:11.312Z | v1.10.1 1 | subchannel_call | [35] parsed message of length 111
D 2024-07-30T07:33:11.313Z | v1.10.1 1 | subchannel_call | [35] pushing to reader message of length 111
D 2024-07-30T07:33:11.313Z | v1.10.1 1 | load_balancing_call | [31] Received message
D 2024-07-30T07:33:11.313Z | v1.10.1 1 | retrying_call | [30] Received message from child [31]
D 2024-07-30T07:33:11.313Z | v1.10.1 1 | resolving_call | [23] Received message
D 2024-07-30T07:33:11.314Z | v1.10.1 1 | resolving_call | [23] Finished filtering received message
D 2024-07-30T07:33:11.315Z | v1.10.1 1 | subchannel_call | [35] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:11.316Z | v1.10.1 1 | subchannel_call | [35] received status code 0 from server
D 2024-07-30T07:33:11.316Z | v1.10.1 1 | subchannel_call | [35] received status details string "" from server
D 2024-07-30T07:33:11.317Z | v1.10.1 1 | subchannel_call | [35] close http2 stream with code 0
D 2024-07-30T07:33:11.317Z | v1.10.1 1 | subchannel_call | [35] ended with status: code=0 details=""
D 2024-07-30T07:33:11.318Z | v1.10.1 1 | load_balancing_call | [31] Received status
D 2024-07-30T07:33:11.318Z | v1.10.1 1 | load_balancing_call | [31] ended with status: code=0 details=""
D 2024-07-30T07:33:11.318Z | v1.10.1 1 | retrying_call | [30] Received status from child [31]
D 2024-07-30T07:33:11.318Z | v1.10.1 1 | retrying_call | [30] state=COMMITTED handling status with progress PROCESSED from child [31] in state ACTIVE
D 2024-07-30T07:33:11.319Z | v1.10.1 1 | retrying_call | [30] ended with status: code=0 details=""
D 2024-07-30T07:33:11.319Z | v1.10.1 1 | resolving_call | [23] Received status
D 2024-07-30T07:33:11.319Z | v1.10.1 1 | resolving_call | [23] ended with status: code=0 details=""
D 2024-07-30T07:33:11.320Z | v1.10.1 1 | subchannel_call | [35] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:11.359Z | v1.10.1 1 | subchannel_call | [34] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:11.359Z | v1.10.1 1 | load_balancing_call | [29] Received metadata
D 2024-07-30T07:33:11.360Z | v1.10.1 1 | retrying_call | [28] Received metadata from child [29]
D 2024-07-30T07:33:11.360Z | v1.10.1 1 | retrying_call | [28] Committing call [29] at index 0
D 2024-07-30T07:33:11.360Z | v1.10.1 1 | resolving_call | [22] Received metadata
D 2024-07-30T07:33:11.361Z | v1.10.1 1 | subchannel_call | [34] receive HTTP/2 data frame of length 111
D 2024-07-30T07:33:11.361Z | v1.10.1 1 | subchannel_call | [34] parsed message of length 111
D 2024-07-30T07:33:11.361Z | v1.10.1 1 | subchannel_call | [34] pushing to reader message of length 111
D 2024-07-30T07:33:11.362Z | v1.10.1 1 | load_balancing_call | [29] Received message
D 2024-07-30T07:33:11.362Z | v1.10.1 1 | retrying_call | [28] Received message from child [29]
D 2024-07-30T07:33:11.363Z | v1.10.1 1 | resolving_call | [22] Received message
D 2024-07-30T07:33:11.363Z | v1.10.1 1 | resolving_call | [22] Finished filtering received message
D 2024-07-30T07:33:11.364Z | v1.10.1 1 | subchannel_call | [34] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:11.365Z | v1.10.1 1 | subchannel_call | [34] received status code 0 from server
D 2024-07-30T07:33:11.365Z | v1.10.1 1 | subchannel_call | [34] received status details string "" from server
D 2024-07-30T07:33:11.365Z | v1.10.1 1 | subchannel_call | [34] close http2 stream with code 0
D 2024-07-30T07:33:11.366Z | v1.10.1 1 | subchannel_call | [34] ended with status: code=0 details=""
D 2024-07-30T07:33:11.366Z | v1.10.1 1 | load_balancing_call | [29] Received status
D 2024-07-30T07:33:11.367Z | v1.10.1 1 | load_balancing_call | [29] ended with status: code=0 details=""
D 2024-07-30T07:33:11.367Z | v1.10.1 1 | retrying_call | [28] Received status from child [29]
D 2024-07-30T07:33:11.367Z | v1.10.1 1 | retrying_call | [28] state=COMMITTED handling status with progress PROCESSED from child [29] in state ACTIVE
D 2024-07-30T07:33:11.367Z | v1.10.1 1 | retrying_call | [28] ended with status: code=0 details=""
D 2024-07-30T07:33:11.368Z | v1.10.1 1 | resolving_call | [22] Received status
D 2024-07-30T07:33:11.368Z | v1.10.1 1 | resolving_call | [22] ended with status: code=0 details=""
D 2024-07-30T07:33:11.368Z | v1.10.1 1 | subchannel_call | [34] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:11.373Z | v1.10.1 1 | subchannel_call | [33] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:11.373Z | v1.10.1 1 | load_balancing_call | [27] Received metadata
D 2024-07-30T07:33:11.374Z | v1.10.1 1 | retrying_call | [26] Received metadata from child [27]
D 2024-07-30T07:33:11.374Z | v1.10.1 1 | retrying_call | [26] Committing call [27] at index 0
D 2024-07-30T07:33:11.374Z | v1.10.1 1 | resolving_call | [21] Received metadata
D 2024-07-30T07:33:11.374Z | v1.10.1 1 | subchannel_call | [33] receive HTTP/2 data frame of length 101
D 2024-07-30T07:33:11.375Z | v1.10.1 1 | subchannel_call | [33] parsed message of length 101
D 2024-07-30T07:33:11.375Z | v1.10.1 1 | subchannel_call | [33] pushing to reader message of length 101
D 2024-07-30T07:33:11.375Z | v1.10.1 1 | load_balancing_call | [27] Received message
D 2024-07-30T07:33:11.375Z | v1.10.1 1 | retrying_call | [26] Received message from child [27]
D 2024-07-30T07:33:11.376Z | v1.10.1 1 | resolving_call | [21] Received message
D 2024-07-30T07:33:11.376Z | v1.10.1 1 | resolving_call | [21] Finished filtering received message
D 2024-07-30T07:33:11.378Z | v1.10.1 1 | subchannel_call | [33] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:11.379Z | v1.10.1 1 | subchannel_call | [33] received status code 0 from server
D 2024-07-30T07:33:11.379Z | v1.10.1 1 | subchannel_call | [33] received status details string "" from server
D 2024-07-30T07:33:11.379Z | v1.10.1 1 | subchannel_call | [33] close http2 stream with code 0
D 2024-07-30T07:33:11.380Z | v1.10.1 1 | subchannel_call | [33] ended with status: code=0 details=""
D 2024-07-30T07:33:11.380Z | v1.10.1 1 | load_balancing_call | [27] Received status
D 2024-07-30T07:33:11.381Z | v1.10.1 1 | load_balancing_call | [27] ended with status: code=0 details=""
D 2024-07-30T07:33:11.381Z | v1.10.1 1 | retrying_call | [26] Received status from child [27]
D 2024-07-30T07:33:11.381Z | v1.10.1 1 | retrying_call | [26] state=COMMITTED handling status with progress PROCESSED from child [27] in state ACTIVE
D 2024-07-30T07:33:11.381Z | v1.10.1 1 | retrying_call | [26] ended with status: code=0 details=""
D 2024-07-30T07:33:11.382Z | v1.10.1 1 | resolving_call | [21] Received status
D 2024-07-30T07:33:11.382Z | v1.10.1 1 | resolving_call | [21] ended with status: code=0 details=""
D 2024-07-30T07:33:11.382Z | v1.10.1 1 | subchannel_call | [33] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:11.610Z | resolving_call | [0] Created
D 2024-07-30T07:33:11.610Z | resolving_call | [0] Deadline: 2024-07-30T07:33:21.608Z
D 2024-07-30T07:33:11.611Z | resolving_call | [0] Deadline will be reached in 9998ms
D 2024-07-30T07:33:11.611Z | resolving_call | [0] start called
D 2024-07-30T07:33:11.613Z | dns_resolver | Looking up DNS hostname otel-collector.otel
D 2024-07-30T07:33:11.613Z | resolving_load_balancer | dns:otel-collector.otel:4317 IDLE -> CONNECTING
D 2024-07-30T07:33:11.613Z | connectivity_state | (1) dns:otel-collector.otel:4317 IDLE -> CONNECTING
D 2024-07-30T07:33:11.613Z | channel | (1) dns:otel-collector.otel:4317 callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2024-07-30T07:33:11.614Z | resolving_call | [0] startRead called
D 2024-07-30T07:33:11.618Z | resolving_call | [0] write() called with message of length 988
D 2024-07-30T07:33:11.618Z | resolving_call | [0] halfClose called
D 2024-07-30T07:33:11.620Z | dns_resolver | Resolved addresses for target dns:otel-collector.otel:4317: [resolved-dns-ip1:4317,resolved-dns-ip2:4317,resolved-dns-ip3:4317]
D 2024-07-30T07:33:11.623Z | subchannel | (2) resolved-dns-ip1:4317 Subchannel constructed with options {
  "grpc.default_compression_algorithm": 0
}
D 2024-07-30T07:33:11.623Z | subchannel_refcount | (2) resolved-dns-ip1:4317 refcount 0 -> 1
D 2024-07-30T07:33:11.624Z | subchannel | (3) resolved-dns-ip2:4317 Subchannel constructed with options {
  "grpc.default_compression_algorithm": 0
}
D 2024-07-30T07:33:11.624Z | subchannel_refcount | (3) resolved-dns-ip2:4317 refcount 0 -> 1
D 2024-07-30T07:33:11.624Z | subchannel | (4) resolved-dns-ip3:4317 Subchannel constructed with options {
  "grpc.default_compression_algorithm": 0
}
D 2024-07-30T07:33:11.624Z | subchannel_refcount | (4) resolved-dns-ip3:4317 refcount 0 -> 1
D 2024-07-30T07:33:11.624Z | subchannel_refcount | (2) resolved-dns-ip1:4317 refcount 1 -> 2
D 2024-07-30T07:33:11.625Z | subchannel_refcount | (3) resolved-dns-ip2:4317 refcount 1 -> 2
D 2024-07-30T07:33:11.625Z | subchannel_refcount | (4) resolved-dns-ip3:4317 refcount 1 -> 2
D 2024-07-30T07:33:11.626Z | pick_first | Start connecting to subchannel with address resolved-dns-ip1:4317
D 2024-07-30T07:33:11.626Z | pick_first | IDLE -> CONNECTING
D 2024-07-30T07:33:11.627Z | resolving_load_balancer | dns:otel-collector.otel:4317 CONNECTING -> CONNECTING
D 2024-07-30T07:33:11.627Z | channel | (1) dns:otel-collector.otel:4317 callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2024-07-30T07:33:11.627Z | connectivity_state | (1) dns:otel-collector.otel:4317 CONNECTING -> CONNECTING
D 2024-07-30T07:33:11.629Z | subchannel | (2) resolved-dns-ip1:4317 IDLE -> CONNECTING
D 2024-07-30T07:33:11.632Z | transport | dns:otel-collector.otel:4317 creating HTTP/2 session to resolved-dns-ip1:4317
D 2024-07-30T07:33:11.633Z | channel | (1) dns:otel-collector.otel:4317 createRetryingCall [1] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:11.633Z | resolving_call | [0] Created child [1]
D 2024-07-30T07:33:11.634Z | retrying_call | [1] start called
D 2024-07-30T07:33:11.634Z | channel | (1) dns:otel-collector.otel:4317 createLoadBalancingCall [2] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:11.635Z | retrying_call | [1] Created child call [2] for attempt 1
D 2024-07-30T07:33:11.635Z | load_balancing_call | [2] start called
D 2024-07-30T07:33:11.636Z | load_balancing_call | [2] Pick called
D 2024-07-30T07:33:11.636Z | load_balancing_call | [2] Pick result: QUEUE subchannel: null status: undefined undefined
D 2024-07-30T07:33:11.637Z | channel | (1) dns:otel-collector.otel:4317 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2024-07-30T07:33:11.637Z | retrying_call | [1] startRead called
D 2024-07-30T07:33:11.637Z | load_balancing_call | [2] startRead called
D 2024-07-30T07:33:11.639Z | retrying_call | [1] write() called with message of length 993
D 2024-07-30T07:33:11.639Z | load_balancing_call | [2] write() called with message of length 993
D 2024-07-30T07:33:11.639Z | retrying_call | [1] halfClose called
D 2024-07-30T07:33:11.642Z | subchannel | (2) resolved-dns-ip1:4317 CONNECTING -> READY
D 2024-07-30T07:33:11.642Z | pick_first | Pick subchannel with address resolved-dns-ip1:4317
D 2024-07-30T07:33:11.642Z | subchannel_refcount | (2) resolved-dns-ip1:4317 refcount 2 -> 3
D 2024-07-30T07:33:11.642Z | subchannel_refcount | (2) resolved-dns-ip1:4317 refcount 3 -> 2
D 2024-07-30T07:33:11.643Z | subchannel_refcount | (3) resolved-dns-ip2:4317 refcount 2 -> 1
D 2024-07-30T07:33:11.643Z | subchannel_refcount | (4) resolved-dns-ip3:4317 refcount 2 -> 1
D 2024-07-30T07:33:11.644Z | pick_first | CONNECTING -> READY
D 2024-07-30T07:33:11.644Z | resolving_load_balancer | dns:otel-collector.otel:4317 CONNECTING -> READY
D 2024-07-30T07:33:11.644Z | channel | (1) dns:otel-collector.otel:4317 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2024-07-30T07:33:11.644Z | load_balancing_call | [2] Pick called
D 2024-07-30T07:33:11.644Z | load_balancing_call | [2] Pick result: COMPLETE subchannel: (2) resolved-dns-ip1:4317 status: undefined undefined
D 2024-07-30T07:33:11.644Z | connectivity_state | (1) dns:otel-collector.otel:4317 CONNECTING -> READY
D 2024-07-30T07:33:11.647Z | transport_flowctrl | (5) resolved-dns-ip1:4317 local window size: 65535 remote window size: 65535
D 2024-07-30T07:33:11.647Z | transport_internals | (5) resolved-dns-ip1:4317 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:11.648Z | load_balancing_call | [2] Created child call [3]
D 2024-07-30T07:33:11.649Z | subchannel_call | [3] write() called with message of length 993
D 2024-07-30T07:33:11.649Z | subchannel_call | [3] sending data chunk of length 993
D 2024-07-30T07:33:11.650Z | load_balancing_call | [2] halfClose called
D 2024-07-30T07:33:11.650Z | subchannel_call | [3] end() called
D 2024-07-30T07:33:11.650Z | subchannel_call | [3] calling end() on HTTP/2 stream
D 2024-07-30T07:33:11.651Z | transport | (5) resolved-dns-ip1:4317 local settings acknowledged by remote: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false}
D 2024-07-30T07:33:11.652Z | subchannel_call | [3] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:11.653Z | load_balancing_call | [2] Received metadata
D 2024-07-30T07:33:11.653Z | retrying_call | [1] Received metadata from child [2]
D 2024-07-30T07:33:11.653Z | retrying_call | [1] Committing call [2] at index 0
D 2024-07-30T07:33:11.654Z | resolving_call | [0] Received metadata
D 2024-07-30T07:33:11.654Z | subchannel_call | [3] receive HTTP/2 data frame of length 7
D 2024-07-30T07:33:11.655Z | subchannel_call | [3] parsed message of length 7
D 2024-07-30T07:33:11.655Z | subchannel_call | [3] pushing to reader message of length 7
D 2024-07-30T07:33:11.656Z | load_balancing_call | [2] Received message
D 2024-07-30T07:33:11.656Z | retrying_call | [1] Received message from child [2]
D 2024-07-30T07:33:11.656Z | resolving_call | [0] Received message
D 2024-07-30T07:33:11.657Z | resolving_call | [0] Finished filtering received message
D 2024-07-30T07:33:11.659Z | subchannel_call | [3] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:11.659Z | subchannel_call | [3] received status code 0 from server
D 2024-07-30T07:33:11.659Z | subchannel_call | [3] received status details string "" from server
D 2024-07-30T07:33:11.660Z | subchannel_call | [3] close http2 stream with code 0
D 2024-07-30T07:33:11.660Z | subchannel_call | [3] ended with status: code=0 details=""
D 2024-07-30T07:33:11.661Z | load_balancing_call | [2] Received status
D 2024-07-30T07:33:11.661Z | load_balancing_call | [2] ended with status: code=0 details=""
D 2024-07-30T07:33:11.661Z | retrying_call | [1] Received status from child [2]
D 2024-07-30T07:33:11.662Z | retrying_call | [1] state=COMMITTED handling status with progress PROCESSED from child [2] in state ACTIVE
D 2024-07-30T07:33:11.662Z | retrying_call | [1] ended with status: code=0 details=""
D 2024-07-30T07:33:11.663Z | resolving_call | [0] Received status
D 2024-07-30T07:33:11.663Z | resolving_call | [0] ended with status: code=0 details=""
D 2024-07-30T07:33:11.664Z | subchannel_call | [3] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:11.874Z | v1.10.1 1 | resolving_load_balancer | xds:///grpcIAMServer:5000 READY -> READY
D 2024-07-30T07:33:11.874Z | v1.10.1 1 | connectivity_state | (1) xds:///grpcIAMServer:5000 READY -> READY
ABujalance commented 1 month ago

The second request:

D 2024-07-30T07:33:17.115Z | v1.10.1 1 | priority | Assigning child cluster=secondaryGRPCAfterIAm.8090, child_number=0 endpoint list ([secondaryGRPCAfterIAm-IP-1:8090] path={region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-42h6s}),([secondaryGRPCAfterIAm-IP-2:8090] path={region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-cz4k9}),([secondaryGRPCAfterIAm-IP-3:8090] path={region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-czgp8}),([secondaryGRPCAfterIAm-IP-4:8090] path={region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-m6h8z}),([secondaryGRPCAfterIAm-IP-5:8090] path={region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-llxqq})
D 2024-07-30T07:33:17.115Z | v1.10.1 1 | priority | Trying priority 0 child cluster=secondaryGRPCAfterIAm.8090, child_number=0
D 2024-07-30T07:33:17.115Z | v1.10.1 1 | priority | Starting failover timer for child cluster=secondaryGRPCAfterIAm.8090, child_number=0
D 2024-07-30T07:33:17.115Z | v1.10.1 1 | outlier_detection | Adding map entry for [secondaryGRPCAfterIAm-IP-1:8090]
D 2024-07-30T07:33:17.115Z | v1.10.1 1 | outlier_detection | Adding map entry for [secondaryGRPCAfterIAm-IP-2:8090]
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | outlier_detection | Adding map entry for [secondaryGRPCAfterIAm-IP-3:8090]
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | outlier_detection | Adding map entry for [secondaryGRPCAfterIAm-IP-4:8090]
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | outlier_detection | Adding map entry for [secondaryGRPCAfterIAm-IP-5:8090]
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | xds_cluster_impl | Received update with config: {
  "cluster": "secondaryGRPCAfterIAm.8090",
  "dropCategories": [],
  "childPolicy": {
    "childPolicy": [
      {
        "round_robin": {}
      }
    ]
  },
  "edsServiceName": "",
  "maxConcurrentRequests": 1024
}
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | weighted_target | Assigning target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-42h6s} address list ([secondaryGRPCAfterIAm-IP-1:8090] path=)
D 2024-07-30T07:33:17.116Z | v1.10.1 1 | round_robin | Connect to endpoint list [secondaryGRPCAfterIAm-IP-1:8090]
D 2024-07-30T07:33:17.117Z | v1.10.1 1 | subchannel_refcount | (13) secondaryGRPCAfterIAm-IP-1:8090 refcount 2 -> 3
D 2024-07-30T07:33:17.117Z | v1.10.1 1 | pick_first | Pick subchannel with address secondaryGRPCAfterIAm-IP-1:8090
D 2024-07-30T07:33:17.117Z | v1.10.1 1 | subchannel_refcount | (13) secondaryGRPCAfterIAm-IP-1:8090 refcount 3 -> 4
D 2024-07-30T07:33:17.117Z | v1.10.1 1 | subchannel_refcount | (13) secondaryGRPCAfterIAm-IP-1:8090 refcount 4 -> 3
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | round_robin | IDLE -> READY
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | weighted_target | Target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-42h6s} IDLE -> READY
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | weighted_target | Assigning target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-cz4k9} address list ([secondaryGRPCAfterIAm-IP-2:8090] path=)
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | round_robin | Connect to endpoint list [secondaryGRPCAfterIAm-IP-2:8090]
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | subchannel_refcount | (14) secondaryGRPCAfterIAm-IP-2:8090 refcount 2 -> 3
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | pick_first | Pick subchannel with address secondaryGRPCAfterIAm-IP-2:8090
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | subchannel_refcount | (14) secondaryGRPCAfterIAm-IP-2:8090 refcount 3 -> 4
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | subchannel_refcount | (14) secondaryGRPCAfterIAm-IP-2:8090 refcount 4 -> 3
D 2024-07-30T07:33:17.118Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | round_robin | IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | weighted_target | Target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-cz4k9} IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | weighted_target | Assigning target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-czgp8} address list ([secondaryGRPCAfterIAm-IP-3:8090] path=)
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | round_robin | Connect to endpoint list [secondaryGRPCAfterIAm-IP-3:8090]
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | subchannel_refcount | (15) secondaryGRPCAfterIAm-IP-3:8090 refcount 2 -> 3
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | pick_first | Pick subchannel with address secondaryGRPCAfterIAm-IP-3:8090
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | subchannel_refcount | (15) secondaryGRPCAfterIAm-IP-3:8090 refcount 3 -> 4
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | subchannel_refcount | (15) secondaryGRPCAfterIAm-IP-3:8090 refcount 4 -> 3
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | round_robin | IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | weighted_target | Target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-czgp8} IDLE -> READY
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | weighted_target | Assigning target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-m6h8z} address list ([secondaryGRPCAfterIAm-IP-4:8090] path=)
D 2024-07-30T07:33:17.119Z | v1.10.1 1 | round_robin | Connect to endpoint list [secondaryGRPCAfterIAm-IP-4:8090]
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | subchannel_refcount | (16) secondaryGRPCAfterIAm-IP-4:8090 refcount 2 -> 3
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | pick_first | Pick subchannel with address secondaryGRPCAfterIAm-IP-4:8090
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | subchannel_refcount | (16) secondaryGRPCAfterIAm-IP-4:8090 refcount 3 -> 4
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | subchannel_refcount | (16) secondaryGRPCAfterIAm-IP-4:8090 refcount 4 -> 3
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | round_robin | IDLE -> READY
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | weighted_target | Target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-m6h8z} IDLE -> READY
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | weighted_target | Assigning target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-llxqq} address list ([secondaryGRPCAfterIAm-IP-5:8090] path=)
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | round_robin | Connect to endpoint list [secondaryGRPCAfterIAm-IP-5:8090]
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | subchannel_refcount | (17) secondaryGRPCAfterIAm-IP-5:8090 refcount 2 -> 3
D 2024-07-30T07:33:17.120Z | v1.10.1 1 | pick_first | Pick subchannel with address secondaryGRPCAfterIAm-IP-5:8090
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | subchannel_refcount | (17) secondaryGRPCAfterIAm-IP-5:8090 refcount 3 -> 4
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | subchannel_refcount | (17) secondaryGRPCAfterIAm-IP-5:8090 refcount 4 -> 3
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | round_robin | IDLE -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | weighted_target | Target {region=,zone=merit,sub_zone=secondaryGRPCAfterIAm-grpc-5877f48847-llxqq} IDLE -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | weighted_target | Transitioning to READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | priority | Child cluster=secondaryGRPCAfterIAm.8090, child_number=0 IDLE -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | priority | Child cluster=secondaryGRPCAfterIAm.8090, child_number=0 transitioning to READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | priority | Trying priority 0 child cluster=secondaryGRPCAfterIAm.8090, child_number=0
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | priority | Transitioning to READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | xds_cluster_manager | Child secondaryGRPCAfterIAm.8090 CONNECTING -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | resolving_load_balancer | xds:///secondaryGRPCAfterIAm:8090 CONNECTING -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | connectivity_state | (25) xds:///secondaryGRPCAfterIAm:8090 CONNECTING -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | outlier_detection | Counting disabled. Cancelling timer.
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | priority | Transitioning to READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | xds_cluster_manager | Child secondaryGRPCAfterIAm.8090 READY -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | resolving_load_balancer | xds:///secondaryGRPCAfterIAm:8090 READY -> READY
D 2024-07-30T07:33:17.121Z | v1.10.1 1 | connectivity_state | (25) xds:///secondaryGRPCAfterIAm:8090 READY -> READY
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | channel | (25) xds:///secondaryGRPCAfterIAm:8090 createRetryingCall [41] method="/meter_reads.v2.MeterReadsService/GetAccountServices"
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | resolving_call | [40] Created child [41]
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | retrying_call | [41] start called
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | channel | (25) xds:///secondaryGRPCAfterIAm:8090 createLoadBalancingCall [42] method="/meter_reads.v2.MeterReadsService/GetAccountServices"
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | retrying_call | [41] Created child call [42] for attempt 1
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | load_balancing_call | [42] start called
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | load_balancing_call | [42] Pick called
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | load_balancing_call | [42] Pick result: COMPLETE subchannel: (17) secondaryGRPCAfterIAm-IP-5:8090 status: undefined undefined
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | retrying_call | [41] startRead called
D 2024-07-30T07:33:17.122Z | v1.10.1 1 | load_balancing_call | [42] startRead called
D 2024-07-30T07:33:17.123Z | v1.10.1 1 | transport_flowctrl | (21) secondaryGRPCAfterIAm-IP-5:8090 local window size: 65535 remote window size: 65535
D 2024-07-30T07:33:17.123Z | v1.10.1 1 | transport_internals | (21) secondaryGRPCAfterIAm-IP-5:8090 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:17.123Z | v1.10.1 1 | load_balancing_call | [42] Created child call [43]
D 2024-07-30T07:33:17.123Z | v1.10.1 1 | retrying_call | [41] write() called with message of length 43
D 2024-07-30T07:33:17.123Z | v1.10.1 1 | load_balancing_call | [42] write() called with message of length 43
D 2024-07-30T07:33:17.191Z | v1.10.1 1 | subchannel_call | [43] write() called with message of length 43
D 2024-07-30T07:33:17.191Z | v1.10.1 1 | subchannel_call | [43] sending data chunk of length 43
D 2024-07-30T07:33:17.192Z | v1.10.1 1 | retrying_call | [41] halfClose called
D 2024-07-30T07:33:17.192Z | v1.10.1 1 | load_balancing_call | [42] halfClose called
D 2024-07-30T07:33:17.192Z | v1.10.1 1 | subchannel_call | [43] end() called
D 2024-07-30T07:33:17.193Z | v1.10.1 1 | subchannel_call | [43] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.391Z | v1.10.1 1 | subchannel_call | [43] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.392Z | v1.10.1 1 | load_balancing_call | [42] Received metadata
D 2024-07-30T07:33:17.392Z | v1.10.1 1 | retrying_call | [41] Received metadata from child [42]
D 2024-07-30T07:33:17.392Z | v1.10.1 1 | retrying_call | [41] Committing call [42] at index 0
D 2024-07-30T07:33:17.392Z | v1.10.1 1 | resolving_call | [40] Received metadata
D 2024-07-30T07:33:17.393Z | v1.10.1 1 | subchannel_call | [43] receive HTTP/2 data frame of length 342
D 2024-07-30T07:33:17.393Z | v1.10.1 1 | subchannel_call | [43] parsed message of length 342
D 2024-07-30T07:33:17.393Z | v1.10.1 1 | subchannel_call | [43] pushing to reader message of length 342
D 2024-07-30T07:33:17.394Z | v1.10.1 1 | load_balancing_call | [42] Received message
D 2024-07-30T07:33:17.394Z | v1.10.1 1 | retrying_call | [41] Received message from child [42]
D 2024-07-30T07:33:17.394Z | v1.10.1 1 | resolving_call | [40] Received message
D 2024-07-30T07:33:17.394Z | v1.10.1 1 | resolving_call | [40] Finished filtering received message
D 2024-07-30T07:33:17.397Z | v1.10.1 1 | subchannel_call | [43] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.397Z | v1.10.1 1 | subchannel_call | [43] received status code 0 from server
D 2024-07-30T07:33:17.398Z | v1.10.1 1 | subchannel_call | [43] received status details string "" from server
D 2024-07-30T07:33:17.398Z | v1.10.1 1 | subchannel_call | [43] close http2 stream with code 0
D 2024-07-30T07:33:17.398Z | v1.10.1 1 | subchannel_call | [43] ended with status: code=0 details=""
D 2024-07-30T07:33:17.399Z | v1.10.1 1 | load_balancing_call | [42] Received status
D 2024-07-30T07:33:17.399Z | v1.10.1 1 | load_balancing_call | [42] ended with status: code=0 details=""
D 2024-07-30T07:33:17.400Z | v1.10.1 1 | retrying_call | [41] Received status from child [42]
D 2024-07-30T07:33:17.400Z | v1.10.1 1 | retrying_call | [41] state=COMMITTED handling status with progress PROCESSED from child [42] in state ACTIVE
D 2024-07-30T07:33:17.400Z | v1.10.1 1 | retrying_call | [41] ended with status: code=0 details=""
D 2024-07-30T07:33:17.401Z | v1.10.1 1 | resolving_call | [40] Received status
D 2024-07-30T07:33:17.401Z | v1.10.1 1 | resolving_call | [40] ended with status: code=0 details=""
D 2024-07-30T07:33:17.401Z | v1.10.1 1 | subchannel_call | [43] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:17.404Z | v1.10.1 1 | channel | (25) xds:///secondaryGRPCAfterIAm:8090 createResolvingCall [44] method="/meter_reads.v2.MeterReadsService/GetServiceMeters", deadline=Infinity
D 2024-07-30T07:33:17.405Z | v1.10.1 1 | resolving_call | [44] Created
D 2024-07-30T07:33:17.405Z | v1.10.1 1 | resolving_call | [44] Deadline: Infinity
D 2024-07-30T07:33:17.405Z | v1.10.1 1 | resolving_call | [44] start called
D 2024-07-30T07:33:17.407Z | v1.10.1 1 | resolving_call | [44] startRead called
D 2024-07-30T07:33:17.407Z | v1.10.1 1 | resolving_call | [44] write() called with message of length 78
D 2024-07-30T07:33:17.407Z | v1.10.1 1 | resolving_call | [44] halfClose called
D 2024-07-30T07:33:17.408Z | v1.10.1 1 | channel | (25) xds:///secondaryGRPCAfterIAm:8090 createRetryingCall [45] method="/meter_reads.v2.MeterReadsService/GetServiceMeters"
D 2024-07-30T07:33:17.408Z | v1.10.1 1 | resolving_call | [44] Created child [45]
D 2024-07-30T07:33:17.408Z | v1.10.1 1 | retrying_call | [45] start called
D 2024-07-30T07:33:17.408Z | v1.10.1 1 | channel | (25) xds:///secondaryGRPCAfterIAm:8090 createLoadBalancingCall [46] method="/meter_reads.v2.MeterReadsService/GetServiceMeters"
D 2024-07-30T07:33:17.409Z | v1.10.1 1 | retrying_call | [45] Created child call [46] for attempt 1
D 2024-07-30T07:33:17.409Z | v1.10.1 1 | load_balancing_call | [46] start called
D 2024-07-30T07:33:17.409Z | v1.10.1 1 | load_balancing_call | [46] Pick called
D 2024-07-30T07:33:17.410Z | v1.10.1 1 | load_balancing_call | [46] Pick result: COMPLETE subchannel: (13) secondaryGRPCAfterIAm-IP-1:8090 status: undefined undefined
D 2024-07-30T07:33:17.410Z | v1.10.1 1 | retrying_call | [45] startRead called
D 2024-07-30T07:33:17.410Z | v1.10.1 1 | load_balancing_call | [46] startRead called
D 2024-07-30T07:33:17.411Z | v1.10.1 1 | transport_flowctrl | (18) secondaryGRPCAfterIAm-IP-1:8090 local window size: 64901 remote window size: 65535
D 2024-07-30T07:33:17.412Z | v1.10.1 1 | transport_internals | (18) secondaryGRPCAfterIAm-IP-1:8090 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:17.412Z | v1.10.1 1 | load_balancing_call | [46] Created child call [47]
D 2024-07-30T07:33:17.412Z | v1.10.1 1 | retrying_call | [45] write() called with message of length 83
D 2024-07-30T07:33:17.413Z | v1.10.1 1 | load_balancing_call | [46] write() called with message of length 83
D 2024-07-30T07:33:17.413Z | v1.10.1 1 | subchannel_call | [47] write() called with message of length 83
D 2024-07-30T07:33:17.413Z | v1.10.1 1 | subchannel_call | [47] sending data chunk of length 83
D 2024-07-30T07:33:17.413Z | v1.10.1 1 | retrying_call | [45] halfClose called
D 2024-07-30T07:33:17.414Z | v1.10.1 1 | load_balancing_call | [46] halfClose called
D 2024-07-30T07:33:17.415Z | v1.10.1 1 | subchannel_call | [47] end() called
D 2024-07-30T07:33:17.415Z | v1.10.1 1 | subchannel_call | [47] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.603Z | v1.10.1 1 | subchannel_call | [65] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | load_balancing_call | [63] halfClose called
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | subchannel_call | [67] end() called
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | subchannel_call | [67] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | load_balancing_call | [61] halfClose called
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | subchannel_call | [66] end() called
D 2024-07-30T07:33:17.604Z | v1.10.1 1 | subchannel_call | [66] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.631Z | v1.10.1 1 | subchannel_call | [66] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.631Z | v1.10.1 1 | load_balancing_call | [61] Received metadata
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | retrying_call | [60] Received metadata from child [61]
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | retrying_call | [60] Committing call [61] at index 0
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | resolving_call | [54] Received metadata
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | subchannel_call | [66] receive HTTP/2 data frame of length 111
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | subchannel_call | [66] parsed message of length 111
D 2024-07-30T07:33:17.632Z | v1.10.1 1 | subchannel_call | [66] pushing to reader message of length 111
D 2024-07-30T07:33:17.633Z | v1.10.1 1 | load_balancing_call | [61] Received message
D 2024-07-30T07:33:17.633Z | v1.10.1 1 | retrying_call | [60] Received message from child [61]
D 2024-07-30T07:33:17.633Z | v1.10.1 1 | resolving_call | [54] Received message
D 2024-07-30T07:33:17.633Z | v1.10.1 1 | resolving_call | [54] Finished filtering received message
D 2024-07-30T07:33:17.634Z | v1.10.1 1 | subchannel_call | [66] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.634Z | v1.10.1 1 | subchannel_call | [66] received status code 0 from server
D 2024-07-30T07:33:17.634Z | v1.10.1 1 | subchannel_call | [66] received status details string "" from server
D 2024-07-30T07:33:17.634Z | v1.10.1 1 | subchannel_call | [66] close http2 stream with code 0
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | subchannel_call | [66] ended with status: code=0 details=""
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | load_balancing_call | [61] Received status
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | load_balancing_call | [61] ended with status: code=0 details=""
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | retrying_call | [60] Received status from child [61]
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | retrying_call | [60] state=COMMITTED handling status with progress PROCESSED from child [61] in state ACTIVE
D 2024-07-30T07:33:17.635Z | v1.10.1 1 | retrying_call | [60] ended with status: code=0 details=""
D 2024-07-30T07:33:17.636Z | v1.10.1 1 | resolving_call | [54] Received status
D 2024-07-30T07:33:17.636Z | v1.10.1 1 | resolving_call | [54] ended with status: code=0 details=""
D 2024-07-30T07:33:17.636Z | v1.10.1 1 | subchannel_call | [66] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:17.638Z | v1.10.1 1 | subchannel_call | [64] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.638Z | v1.10.1 1 | load_balancing_call | [57] Received metadata
D 2024-07-30T07:33:17.638Z | v1.10.1 1 | retrying_call | [56] Received metadata from child [57]
D 2024-07-30T07:33:17.639Z | v1.10.1 1 | retrying_call | [56] Committing call [57] at index 0
D 2024-07-30T07:33:17.639Z | v1.10.1 1 | resolving_call | [52] Received metadata
D 2024-07-30T07:33:17.639Z | v1.10.1 1 | subchannel_call | [64] receive HTTP/2 data frame of length 101
D 2024-07-30T07:33:17.639Z | v1.10.1 1 | subchannel_call | [64] parsed message of length 101
D 2024-07-30T07:33:17.639Z | v1.10.1 1 | subchannel_call | [64] pushing to reader message of length 101
D 2024-07-30T07:33:17.640Z | v1.10.1 1 | load_balancing_call | [57] Received message
D 2024-07-30T07:33:17.640Z | v1.10.1 1 | retrying_call | [56] Received message from child [57]
D 2024-07-30T07:33:17.640Z | v1.10.1 1 | resolving_call | [52] Received message
D 2024-07-30T07:33:17.640Z | v1.10.1 1 | resolving_call | [52] Finished filtering received message
D 2024-07-30T07:33:17.641Z | v1.10.1 1 | subchannel_call | [64] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.642Z | v1.10.1 1 | subchannel_call | [64] received status code 0 from server
D 2024-07-30T07:33:17.642Z | v1.10.1 1 | subchannel_call | [64] received status details string "" from server
D 2024-07-30T07:33:17.642Z | v1.10.1 1 | subchannel_call | [64] close http2 stream with code 0
D 2024-07-30T07:33:17.642Z | v1.10.1 1 | subchannel_call | [64] ended with status: code=0 details=""
D 2024-07-30T07:33:17.643Z | v1.10.1 1 | load_balancing_call | [57] Received status
D 2024-07-30T07:33:17.643Z | v1.10.1 1 | load_balancing_call | [57] ended with status: code=0 details=""
D 2024-07-30T07:33:17.643Z | v1.10.1 1 | retrying_call | [56] Received status from child [57]
D 2024-07-30T07:33:17.643Z | v1.10.1 1 | retrying_call | [56] state=COMMITTED handling status with progress PROCESSED from child [57] in state ACTIVE
D 2024-07-30T07:33:17.643Z | v1.10.1 1 | retrying_call | [56] ended with status: code=0 details=""
D 2024-07-30T07:33:17.644Z | v1.10.1 1 | resolving_call | [52] Received status
D 2024-07-30T07:33:17.644Z | v1.10.1 1 | resolving_call | [52] ended with status: code=0 details=""
D 2024-07-30T07:33:17.644Z | v1.10.1 1 | subchannel_call | [64] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:17.645Z | v1.10.1 1 | subchannel_call | [67] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.645Z | v1.10.1 1 | load_balancing_call | [63] Received metadata
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | retrying_call | [62] Received metadata from child [63]
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | retrying_call | [62] Committing call [63] at index 0
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | resolving_call | [55] Received metadata
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | subchannel_call | [67] receive HTTP/2 data frame of length 111
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | subchannel_call | [67] parsed message of length 111
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | subchannel_call | [67] pushing to reader message of length 111
D 2024-07-30T07:33:17.646Z | v1.10.1 1 | load_balancing_call | [63] Received message
D 2024-07-30T07:33:17.647Z | v1.10.1 1 | retrying_call | [62] Received message from child [63]
D 2024-07-30T07:33:17.647Z | v1.10.1 1 | resolving_call | [55] Received message
D 2024-07-30T07:33:17.647Z | v1.10.1 1 | resolving_call | [55] Finished filtering received message
D 2024-07-30T07:33:17.647Z | v1.10.1 1 | subchannel_call | [67] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.648Z | v1.10.1 1 | subchannel_call | [67] received status code 0 from server
D 2024-07-30T07:33:17.648Z | v1.10.1 1 | subchannel_call | [67] received status details string "" from server
D 2024-07-30T07:33:17.648Z | v1.10.1 1 | subchannel_call | [67] close http2 stream with code 0
D 2024-07-30T07:33:17.648Z | v1.10.1 1 | subchannel_call | [67] ended with status: code=0 details=""
D 2024-07-30T07:33:17.648Z | v1.10.1 1 | load_balancing_call | [63] Received status
D 2024-07-30T07:33:17.648Z | v1.10.1 1 | load_balancing_call | [63] ended with status: code=0 details=""
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | retrying_call | [62] Received status from child [63]
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | retrying_call | [62] state=COMMITTED handling status with progress PROCESSED from child [63] in state ACTIVE
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | retrying_call | [62] ended with status: code=0 details=""
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | resolving_call | [55] Received status
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | resolving_call | [55] ended with status: code=0 details=""
D 2024-07-30T07:33:17.649Z | v1.10.1 1 | subchannel_call | [67] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:17.704Z | v1.10.1 1 | subchannel_call | [65] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.705Z | v1.10.1 1 | load_balancing_call | [59] Received metadata
D 2024-07-30T07:33:17.705Z | v1.10.1 1 | retrying_call | [58] Received metadata from child [59]
D 2024-07-30T07:33:17.705Z | v1.10.1 1 | retrying_call | [58] Committing call [59] at index 0
D 2024-07-30T07:33:17.705Z | v1.10.1 1 | resolving_call | [53] Received metadata
D 2024-07-30T07:33:17.705Z | v1.10.1 1 | subchannel_call | [65] receive HTTP/2 data frame of length 101
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | subchannel_call | [65] parsed message of length 101
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | subchannel_call | [65] pushing to reader message of length 101
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | load_balancing_call | [59] Received message
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | retrying_call | [58] Received message from child [59]
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | resolving_call | [53] Received message
D 2024-07-30T07:33:17.706Z | v1.10.1 1 | resolving_call | [53] Finished filtering received message
D 2024-07-30T07:33:17.707Z | v1.10.1 1 | subchannel_call | [65] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.707Z | v1.10.1 1 | subchannel_call | [65] received status code 0 from server
D 2024-07-30T07:33:17.707Z | v1.10.1 1 | subchannel_call | [65] received status details string "" from server
D 2024-07-30T07:33:17.707Z | v1.10.1 1 | subchannel_call | [65] close http2 stream with code 0
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | subchannel_call | [65] ended with status: code=0 details=""
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | load_balancing_call | [59] Received status
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | load_balancing_call | [59] ended with status: code=0 details=""
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | retrying_call | [58] Received status from child [59]
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | retrying_call | [58] state=COMMITTED handling status with progress PROCESSED from child [59] in state ACTIVE
D 2024-07-30T07:33:17.708Z | v1.10.1 1 | retrying_call | [58] ended with status: code=0 details=""
D 2024-07-30T07:33:17.709Z | v1.10.1 1 | resolving_call | [53] Received status
D 2024-07-30T07:33:17.709Z | v1.10.1 1 | resolving_call | [53] ended with status: code=0 details=""
D 2024-07-30T07:33:17.709Z | v1.10.1 1 | subchannel_call | [65] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:17.916Z | channel | (1) dns:otel-collector.otel:4317 createResolvingCall [4] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export", deadline=2024-07-30T07:33:27.916Z
D 2024-07-30T07:33:17.916Z | resolving_call | [4] Created
D 2024-07-30T07:33:17.917Z | resolving_call | [4] Deadline: 2024-07-30T07:33:27.916Z
D 2024-07-30T07:33:17.917Z | resolving_call | [4] Deadline will be reached in 9999ms
D 2024-07-30T07:33:17.917Z | resolving_call | [4] start called
D 2024-07-30T07:33:17.918Z | resolving_call | [4] startRead called
D 2024-07-30T07:33:17.921Z | resolving_call | [4] write() called with message of length 988
D 2024-07-30T07:33:17.921Z | resolving_call | [4] halfClose called
D 2024-07-30T07:33:17.922Z | channel | (1) dns:otel-collector.otel:4317 createRetryingCall [5] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:17.922Z | resolving_call | [4] Created child [5]
D 2024-07-30T07:33:17.923Z | retrying_call | [5] start called
D 2024-07-30T07:33:17.923Z | channel | (1) dns:otel-collector.otel:4317 createLoadBalancingCall [6] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:17.923Z | retrying_call | [5] Created child call [6] for attempt 1
D 2024-07-30T07:33:17.924Z | load_balancing_call | [6] start called
D 2024-07-30T07:33:17.924Z | load_balancing_call | [6] Pick called
D 2024-07-30T07:33:17.924Z | load_balancing_call | [6] Pick result: COMPLETE subchannel: (2) resolved-dns-ip1:4317 status: undefined undefined
D 2024-07-30T07:33:17.924Z | retrying_call | [5] startRead called
D 2024-07-30T07:33:17.925Z | load_balancing_call | [6] startRead called
D 2024-07-30T07:33:17.926Z | transport_flowctrl | (5) resolved-dns-ip1:4317 local window size: 65528 remote window size: 65535
D 2024-07-30T07:33:17.927Z | transport_internals | (5) resolved-dns-ip1:4317 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:17.927Z | load_balancing_call | [6] Created child call [7]
D 2024-07-30T07:33:17.927Z | retrying_call | [5] write() called with message of length 993
D 2024-07-30T07:33:17.928Z | load_balancing_call | [6] write() called with message of length 993
D 2024-07-30T07:33:17.928Z | subchannel_call | [7] write() called with message of length 993
D 2024-07-30T07:33:17.928Z | subchannel_call | [7] sending data chunk of length 993
D 2024-07-30T07:33:17.928Z | retrying_call | [5] halfClose called
D 2024-07-30T07:33:17.929Z | load_balancing_call | [6] halfClose called
D 2024-07-30T07:33:17.929Z | subchannel_call | [7] end() called
D 2024-07-30T07:33:17.930Z | subchannel_call | [7] calling end() on HTTP/2 stream
D 2024-07-30T07:33:17.931Z | subchannel_call | [7] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:17.931Z | load_balancing_call | [6] Received metadata
D 2024-07-30T07:33:17.931Z | retrying_call | [5] Received metadata from child [6]
D 2024-07-30T07:33:17.932Z | retrying_call | [5] Committing call [6] at index 0
D 2024-07-30T07:33:17.932Z | resolving_call | [4] Received metadata
D 2024-07-30T07:33:17.932Z | subchannel_call | [7] receive HTTP/2 data frame of length 7
D 2024-07-30T07:33:17.933Z | subchannel_call | [7] parsed message of length 7
D 2024-07-30T07:33:17.933Z | subchannel_call | [7] pushing to reader message of length 7
D 2024-07-30T07:33:17.933Z | load_balancing_call | [6] Received message
D 2024-07-30T07:33:17.934Z | retrying_call | [5] Received message from child [6]
D 2024-07-30T07:33:17.934Z | resolving_call | [4] Received message
D 2024-07-30T07:33:17.935Z | resolving_call | [4] Finished filtering received message
D 2024-07-30T07:33:17.935Z | subchannel_call | [7] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:17.936Z | subchannel_call | [7] received status code 0 from server
D 2024-07-30T07:33:17.936Z | subchannel_call | [7] received status details string "" from server
D 2024-07-30T07:33:17.936Z | subchannel_call | [7] close http2 stream with code 0
D 2024-07-30T07:33:17.937Z | subchannel_call | [7] ended with status: code=0 details=""
D 2024-07-30T07:33:17.937Z | load_balancing_call | [6] Received status
D 2024-07-30T07:33:17.938Z | load_balancing_call | [6] ended with status: code=0 details=""
D 2024-07-30T07:33:17.938Z | retrying_call | [5] Received status from child [6]
D 2024-07-30T07:33:17.938Z | retrying_call | [5] state=COMMITTED handling status with progress PROCESSED from child [6] in state ACTIVE
D 2024-07-30T07:33:17.938Z | retrying_call | [5] ended with status: code=0 details=""
D 2024-07-30T07:33:17.939Z | resolving_call | [4] Received status
D 2024-07-30T07:33:17.939Z | resolving_call | [4] ended with status: code=0 details=""
D 2024-07-30T07:33:17.939Z | subchannel_call | [7] HTTP/2 stream closed with code 0
ABujalance commented 1 month ago

The third and final (failing) request:

D 2024-07-30T07:33:21.622Z | subchannel_refcount | (3) resolved-dns-ip2:4317 refcount 1 -> 0
D 2024-07-30T07:33:21.622Z | subchannel_refcount | (4) resolved-dns-ip3:4317 refcount 1 -> 0
        ],
        "ignore_reresolution_requests": true
      },
      "cluster=grpcIAMServer.5000, child_number=1": {
        "config": [
          {
            "outlier_detection": {
              "child_policy": [
                {
                  "xds_cluster_impl": {
                    "cluster": "grpcIAMServer.5000",
                    "drop_categories": [],
                    "eds_service_name": "",
                    "child_policy": [
                      {
                        "ring_hash": {
                          "min_ring_size": 1024,
                          "max_ring_size": 8388608
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ],
        "ignore_reresolution_requests": true
      }
    },
    "priorities": [
      "cluster=grpcIAMServer.5000, child_number=0",
      "cluster=grpcIAMServer.5000, child_number=1"
    ]
  }
}
D 2024-07-30T07:33:22.299Z | v1.10.1 1 | priority | Assigning child cluster=grpcIAMServer.5000, child_number=0 endpoint list ([grpcIAMServer-childCluster01:5000] path={region=,zone=merit,sub_zone=iam-identity-api-1}),([grpcIAMServer-childCluster02:5000] path={region=,zone=merit,sub_zone=iam-identity-api-0})
D 2024-07-30T07:33:22.299Z | v1.10.1 1 | priority | Assigning child cluster=grpcIAMServer.5000, child_number=1 endpoint list ([grpcIAMServer-childCluster11:5000] path={region=,zone=eu-west-1a,sub_zone=iam-identity-api-1}),([grpcIAMServer-childCluster12:5000] path={region=,zone=eu-west-1c,sub_zone=iam-identity-api-0}),([grpcIAMServer-childCluster13:5000] path={region=,zone=europe-west2-a,sub_zone=iam-identity-api-1}),([grpcIAMServer-childCluster14:5000] path={region=,zone=europe-west2-c,sub_zone=iam-identity-api-0})
D 2024-07-30T07:33:22.299Z | v1.10.1 1 | priority | Trying priority 0 child cluster=grpcIAMServer.5000, child_number=0
D 2024-07-30T07:33:22.299Z | v1.10.1 1 | priority | Starting failover timer for child cluster=grpcIAMServer.5000, child_number=0
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | outlier_detection | Adding map entry for [grpcIAMServer-childCluster01:5000]
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | outlier_detection | Adding map entry for [grpcIAMServer-childCluster02:5000]
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | xds_cluster_impl | Received update with config: {
  "cluster": "grpcIAMServer.5000",
  "dropCategories": [],
  "childPolicy": {
    "minRingSize": 1024,
    "maxRingSize": 8388608
  },
  "edsServiceName": "",
  "maxConcurrentRequests": 1024
}
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | ring_hash | Received update with config {
  "ring_hash": {
    "min_ring_size": 1024,
    "max_ring_size": 8388608
  }
}
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | outlier_detection | Counting disabled. Cancelling timer.
D 2024-07-30T07:33:22.300Z | v1.10.1 1 | priority | Transitioning to IDLE
D 2024-07-30T07:33:22.301Z | v1.10.1 1 | xds_cluster_manager | Child grpcIAMServer.5000 CONNECTING -> IDLE
D 2024-07-30T07:33:22.301Z | v1.10.1 1 | resolving_load_balancer | xds:///grpcIAMServer:5000 CONNECTING -> IDLE
D 2024-07-30T07:33:22.301Z | v1.10.1 1 | connectivity_state | (32) xds:///grpcIAMServer:5000 CONNECTING -> IDLE
D 2024-07-30T07:33:22.302Z | v1.10.1 1 | ring_hash | Creating a ring with size 1024
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | ring_hash | IDLE -> IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | priority | Child cluster=grpcIAMServer.5000, child_number=0 IDLE -> IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | priority | Child cluster=grpcIAMServer.5000, child_number=0 transitioning to IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | priority | Trying priority 0 child cluster=grpcIAMServer.5000, child_number=0
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | priority | Transitioning to IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | xds_cluster_manager | Child grpcIAMServer.5000 IDLE -> IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | resolving_load_balancer | xds:///grpcIAMServer:5000 IDLE -> IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | connectivity_state | (32) xds:///grpcIAMServer:5000 IDLE -> IDLE
D 2024-07-30T07:33:22.307Z | v1.10.1 1 | channel | (32) xds:///grpcIAMServer:5000 createRetryingCall [69] method="/iamIdentityService.IdentityService/WhoAmI"
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | resolving_call | [68] Created child [69]
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | retrying_call | [69] start called
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | channel | (32) xds:///grpcIAMServer:5000 createLoadBalancingCall [70] method="/iamIdentityService.IdentityService/WhoAmI"
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | retrying_call | [69] Created child call [70] for attempt 1
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | load_balancing_call | [70] start called
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | load_balancing_call | [70] Pick called
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | ring_hash | Pick called. Hash=8931643041187832995
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | subchannel_refcount | (11) grpcIAMServer-childCluster01:5000 refcount 2 -> 3
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | pick_first | Pick subchannel with address grpcIAMServer-childCluster01:5000
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | subchannel_refcount | (11) grpcIAMServer-childCluster01:5000 refcount 3 -> 4
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | subchannel_refcount | (11) grpcIAMServer-childCluster01:5000 refcount 4 -> 3
D 2024-07-30T07:33:22.308Z | v1.10.1 1 | pick_first | IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | ring_hash | IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | priority | Child cluster=grpcIAMServer.5000, child_number=0 IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | priority | Child cluster=grpcIAMServer.5000, child_number=0 transitioning to READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | priority | Trying priority 0 child cluster=grpcIAMServer.5000, child_number=0
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | priority | Transitioning to READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | xds_cluster_manager | Child grpcIAMServer.5000 IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | resolving_load_balancer | xds:///grpcIAMServer:5000 IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | connectivity_state | (32) xds:///grpcIAMServer:5000 IDLE -> READY
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | load_balancing_call | [70] Pick result: QUEUE subchannel: null status: undefined undefined
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | channel | (32) xds:///grpcIAMServer:5000 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | retrying_call | [69] startRead called
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | load_balancing_call | [70] startRead called
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | retrying_call | [69] write() called with message of length 3720
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | load_balancing_call | [70] write() called with message of length 3720
D 2024-07-30T07:33:22.309Z | v1.10.1 1 | retrying_call | [69] halfClose called
D 2024-07-30T07:33:42.325Z | v1.10.1 1 | load_balancing_call | [75] Received status
D 2024-07-30T07:33:42.325Z | v1.10.1 1 | load_balancing_call | [75] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.325Z | v1.10.1 1 | retrying_call | [69] Received status from child [75]
D 2024-07-30T07:33:42.325Z | v1.10.1 1 | retrying_call | [69] state=RETRY handling status with progress PROCESSED from child [75] in state ACTIVE
D 2024-07-30T07:33:42.325Z | v1.10.1 1 | subchannel_call | [76] HTTP/2 stream closed with code 8
D 2024-07-30T07:33:42.326Z | v1.10.1 1 | subchannel_call | [74] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:42.326Z | v1.10.1 1 | load_balancing_call | [73] Received metadata
D 2024-07-30T07:33:42.326Z | v1.10.1 1 | retrying_call | [72] Received metadata from child [73]
D 2024-07-30T07:33:42.326Z | v1.10.1 1 | retrying_call | [72] Committing call [73] at index 0
D 2024-07-30T07:33:42.326Z | v1.10.1 1 | resolving_call | [71] Received metadata
D 2024-07-30T07:33:42.326Z | v1.10.1 1 | subchannel_call | [74] receive HTTP/2 data frame of length 2318
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | subchannel_call | [74] parsed message of length 2318
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | subchannel_call | [74] pushing to reader message of length 2318
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | load_balancing_call | [73] Received message
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | retrying_call | [72] Received message from child [73]
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | resolving_call | [71] Received message
D 2024-07-30T07:33:42.327Z | v1.10.1 1 | resolving_call | [71] Finished filtering received message
D 2024-07-30T07:33:42.328Z | v1.10.1 1 | subchannel_call | [74] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:42.329Z | v1.10.1 1 | subchannel_call | [74] received status code 0 from server
D 2024-07-30T07:33:42.329Z | v1.10.1 1 | subchannel_call | [74] received status details string "" from server
D 2024-07-30T07:33:42.329Z | v1.10.1 1 | subchannel_call | [74] close http2 stream with code 0
D 2024-07-30T07:33:42.329Z | v1.10.1 1 | subchannel_call | [74] ended with status: code=0 details=""
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | load_balancing_call | [73] Received status
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | load_balancing_call | [73] ended with status: code=0 details=""
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | retrying_call | [72] Received status from child [73]
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | retrying_call | [72] state=COMMITTED handling status with progress PROCESSED from child [73] in state ACTIVE
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | retrying_call | [72] ended with status: code=0 details=""
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | resolving_call | [71] Received status
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | resolving_call | [71] ended with status: code=0 details=""
D 2024-07-30T07:33:42.330Z | v1.10.1 1 | subchannel_call | [74] HTTP/2 stream closed with code 0
D 2024-07-30T07:33:42.336Z | v1.10.1 1 | channel | (32) xds:///grpcIAMServer:5000 createLoadBalancingCall [77] method="/iamIdentityService.IdentityService/WhoAmI"
D 2024-07-30T07:33:42.336Z | v1.10.1 1 | retrying_call | [69] Created child call [77] for attempt 3
D 2024-07-30T07:33:42.336Z | v1.10.1 1 | load_balancing_call | [77] start called
D 2024-07-30T07:33:42.336Z | v1.10.1 1 | load_balancing_call | [77] Pick called
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | ring_hash | Pick called. Hash=8931643041187832995
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | load_balancing_call | [77] Pick result: COMPLETE subchannel: (11) grpcIAMServer-childCluster01:5000 status: undefined undefined
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | load_balancing_call | [77] startRead called
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | transport_flowctrl | (12) grpcIAMServer-childCluster01:5000 local window size: 60899 remote window size: 65535
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | transport_internals | (12) grpcIAMServer-childCluster01:5000 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:42.337Z | v1.10.1 1 | load_balancing_call | [77] Created child call [78]
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] Received server headers:
        :status: 200
        content-type: application/grpc
        grpc-status: 4
        grpc-message: context deadline exceeded

D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] Received server trailers:
        :status: 200
        content-type: application/grpc
        grpc-status: 4
        grpc-message: context deadline exceeded

D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] received status code 4 from server
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] received status details string "context deadline exceeded" from server
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | subchannel_call | [78] close http2 stream with code 8
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | load_balancing_call | [77] Received status
D 2024-07-30T07:33:42.339Z | v1.10.1 1 | load_balancing_call | [77] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.340Z | v1.10.1 1 | retrying_call | [69] Received status from child [77]
D 2024-07-30T07:33:42.340Z | v1.10.1 1 | retrying_call | [69] state=RETRY handling status with progress PROCESSED from child [77] in state ACTIVE
D 2024-07-30T07:33:42.340Z | v1.10.1 1 | subchannel_call | [78] HTTP/2 stream closed with code 8
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | channel | (32) xds:///grpcIAMServer:5000 createLoadBalancingCall [79] method="/iamIdentityService.IdentityService/WhoAmI"
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | retrying_call | [69] Created child call [79] for attempt 4
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | load_balancing_call | [79] start called
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | load_balancing_call | [79] Pick called
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | ring_hash | Pick called. Hash=8931643041187832995
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | load_balancing_call | [79] Pick result: COMPLETE subchannel: (11) grpcIAMServer-childCluster01:5000 status: undefined undefined
D 2024-07-30T07:33:42.344Z | v1.10.1 1 | load_balancing_call | [79] startRead called
D 2024-07-30T07:33:42.345Z | v1.10.1 1 | transport_flowctrl | (12) grpcIAMServer-childCluster01:5000 local window size: 60899 remote window size: 65535
D 2024-07-30T07:33:42.345Z | v1.10.1 1 | transport_internals | (12) grpcIAMServer-childCluster01:5000 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:42.345Z | v1.10.1 1 | load_balancing_call | [79] Created child call [80]
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] Received server headers:
        :status: 200
        content-type: application/grpc
        grpc-status: 4
        grpc-message: context deadline exceeded

D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] Received server trailers:
        :status: 200
        content-type: application/grpc
        grpc-status: 4
        grpc-message: context deadline exceeded

D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] received status code 4 from server
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] received status details string "context deadline exceeded" from server
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | subchannel_call | [80] close http2 stream with code 8
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | load_balancing_call | [79] Received status
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | load_balancing_call | [79] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | retrying_call | [69] Received status from child [79]
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | retrying_call | [69] state=RETRY handling status with progress PROCESSED from child [79] in state ACTIVE
D 2024-07-30T07:33:42.346Z | v1.10.1 1 | retrying_call | [69] ended with status: code=4 details="context deadline exceeded"
D 2024-07-30T07:33:42.347Z | v1.10.1 1 | resolving_call | [68] Received status
D 2024-07-30T07:33:42.347Z | v1.10.1 1 | subchannel_call | [80] HTTP/2 stream closed with code 8
D 2024-07-30T07:33:42.453Z | channel | (1) dns:otel-collector.otel:4317 createResolvingCall [8] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export", deadline=2024-07-30T07:33:52.452Z
D 2024-07-30T07:33:42.453Z | resolving_call | [8] Created
D 2024-07-30T07:33:42.454Z | resolving_call | [8] Deadline: 2024-07-30T07:33:52.452Z
D 2024-07-30T07:33:42.454Z | resolving_call | [8] Deadline will be reached in 9998ms
D 2024-07-30T07:33:42.454Z | resolving_call | [8] start called
D 2024-07-30T07:33:42.455Z | resolving_call | [8] startRead called
D 2024-07-30T07:33:42.461Z | resolving_call | [8] write() called with message of length 868
D 2024-07-30T07:33:42.461Z | resolving_call | [8] halfClose called
D 2024-07-30T07:33:42.461Z | channel | (1) dns:otel-collector.otel:4317 createRetryingCall [9] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:42.462Z | resolving_call | [8] Created child [9]
D 2024-07-30T07:33:42.462Z | retrying_call | [9] start called
D 2024-07-30T07:33:42.463Z | channel | (1) dns:otel-collector.otel:4317 createLoadBalancingCall [10] method="/opentelemetry.proto.collector.trace.v1.TraceService/Export"
D 2024-07-30T07:33:42.463Z | retrying_call | [9] Created child call [10] for attempt 1
D 2024-07-30T07:33:42.464Z | load_balancing_call | [10] start called
D 2024-07-30T07:33:42.464Z | load_balancing_call | [10] Pick called
D 2024-07-30T07:33:42.464Z | load_balancing_call | [10] Pick result: COMPLETE subchannel: (2) resolved-dns-ip1:4317 status: undefined undefined
D 2024-07-30T07:33:42.464Z | retrying_call | [9] startRead called
D 2024-07-30T07:33:42.465Z | load_balancing_call | [10] startRead called
D 2024-07-30T07:33:42.466Z | transport_flowctrl | (5) resolved-dns-ip1:4317 local window size: 65521 remote window size: 65535
D 2024-07-30T07:33:42.466Z | transport_internals | (5) resolved-dns-ip1:4317 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2024-07-30T07:33:42.467Z | load_balancing_call | [10] Created child call [11]
D 2024-07-30T07:33:42.468Z | retrying_call | [9] write() called with message of length 873
D 2024-07-30T07:33:42.468Z | load_balancing_call | [10] write() called with message of length 873
D 2024-07-30T07:33:42.468Z | subchannel_call | [11] write() called with message of length 873
D 2024-07-30T07:33:42.468Z | subchannel_call | [11] sending data chunk of length 873
D 2024-07-30T07:33:42.468Z | retrying_call | [9] halfClose called
D 2024-07-30T07:33:42.469Z | load_balancing_call | [10] halfClose called
D 2024-07-30T07:33:42.469Z | subchannel_call | [11] end() called
D 2024-07-30T07:33:42.469Z | subchannel_call | [11] calling end() on HTTP/2 stream
D 2024-07-30T07:33:42.471Z | subchannel_call | [11] Received server headers:
        :status: 200
        content-type: application/grpc

D 2024-07-30T07:33:42.471Z | load_balancing_call | [10] Received metadata
D 2024-07-30T07:33:42.471Z | retrying_call | [9] Received metadata from child [10]
D 2024-07-30T07:33:42.471Z | retrying_call | [9] Committing call [10] at index 0
D 2024-07-30T07:33:42.472Z | resolving_call | [8] Received metadata
D 2024-07-30T07:33:42.472Z | subchannel_call | [11] receive HTTP/2 data frame of length 7
D 2024-07-30T07:33:42.473Z | subchannel_call | [11] parsed message of length 7
D 2024-07-30T07:33:42.473Z | subchannel_call | [11] pushing to reader message of length 7
D 2024-07-30T07:33:42.473Z | load_balancing_call | [10] Received message
D 2024-07-30T07:33:42.473Z | retrying_call | [9] Received message from child [10]
D 2024-07-30T07:33:42.473Z | resolving_call | [8] Received message
D 2024-07-30T07:33:42.474Z | resolving_call | [8] Finished filtering received message
D 2024-07-30T07:33:42.475Z | subchannel_call | [11] Received server trailers:
        grpc-status: 0
        grpc-message: 

D 2024-07-30T07:33:42.475Z | subchannel_call | [11] received status code 0 from server
D 2024-07-30T07:33:42.475Z | subchannel_call | [11] received status details string "" from server
D 2024-07-30T07:33:42.475Z | subchannel_call | [11] close http2 stream with code 0
D 2024-07-30T07:33:42.476Z | subchannel_call | [11] ended with status: code=0 details=""
D 2024-07-30T07:33:42.476Z | load_balancing_call | [10] Received status
D 2024-07-30T07:33:42.477Z | load_balancing_call | [10] ended with status: code=0 details=""
D 2024-07-30T07:33:42.477Z | retrying_call | [9] Received status from child [10]
D 2024-07-30T07:33:42.477Z | retrying_call | [9] state=COMMITTED handling status with progress PROCESSED from child [10] in state ACTIVE
D 2024-07-30T07:33:42.477Z | retrying_call | [9] ended with status: code=0 details=""
D 2024-07-30T07:33:42.478Z | resolving_call | [8] Received status
D 2024-07-30T07:33:42.478Z | resolving_call | [8] ended with status: code=0 details=""
D 2024-07-30T07:33:42.478Z | subchannel_call | [11] HTTP/2 stream closed with code 0
[07:33:42.332] ERROR: RpcError: Deadline exceeded
    at Object.callback (/app/node_modules/@protobuf-ts/grpc-transport/build/commonjs/grpc-transport.js:37:27)
    at Object.onReceiveStatus (/app/node_modules/@grpc/grpc-js/build/src/client.js:192:36)
    at Object.onReceiveStatus (/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:99:78
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)

The grpcIAMServer doesn't report any errors and there's no information about the failing query there.

Please let us know if you need more logs and if there's nothing that can be extracted for here we can try to create a complete test environment where this is replicable by anyone outside the company.

My colleagues and I will be happy to answer any question and do any extra tests.

murgatroid99 commented 1 month ago

OK, the logs show that you are using @grpc/grpc-js-xds with @grpc/grpc-js, at version 1.10.1. The first change I recommend is that you update @grpc/grpc-js to the latest patch, 1.11.1, and you will also need to update @grpc/grpc-js-xds to version 1.11.x for compatibility. Some potentially relevant fixes have come out in the intervening versions.

I also see some logs from grpc-js 1.9.4, which appears to be used via OpenTelemetry. The way the log lines are interspersed in the last log indicates that it may be from a separate process. If possible, it would help to omit those, to be able to see the logs we care about more clearly.

ABujalance commented 1 month ago

Yes, we also do telemetry and you're right, the grpc-js version of the gqlServer is currently 1.9.4. The requests to the grpcIAMServer are done through a library that has a slightly newer version. We'll update everything to the latest patch and check again. I'll come back with our findings if we still run into the issue, thanks 🙇