grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
21.09k stars 4.38k forks source link

Flaky Test: Test/ServerSideXDS_Fallback #7812

Open arjan-bal opened 1 week ago

arjan-bal commented 1 week ago

Example failure: https://github.com/grpc/grpc-go/actions/runs/11697938683/job/32577429244?pr=7742

Logs

tlogger.go:116: INFO balancer_wrapper.go:109 [core] [Channel #1431]Channel switches to new LB policy "pick_first"  (t=+15.513374ms)
        tlogger.go:116: INFO clientconn.go:838 [core] [Channel #1431 SubChannel #1432]Subchannel created  (t=+15.744814ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #1431]Channel Connectivity change to CONNECTING  (t=+15.90422ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #1431]Channel exiting idle mode  (t=+16.074447ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #1431 SubChannel #1432]Subchannel Connectivity change to CONNECTING  (t=+41.28898ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #1425 SubChannel #1426]Subchannel Connectivity change to READY  (t=+41.57952ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #1425]Channel Connectivity change to READY  (t=+41.978021ms)
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.listener.v3.Listener[grpc/server?xds.resource.listening_address=127.0.0.1:34579 grpc/server?xds.resource.listening_address=127.0.0.1:36251] and known map[]. Diff [grpc/server?xds.resource.listening_address=127.0.0.1:34579 grpc/server?xds.resource.listening_address=127.0.0.1:36251]
        logging.go:30: respond type.googleapis.com/envoy.config.listener.v3.Listener[grpc/server?xds.resource.listening_address=127.0.0.1:34579 grpc/server?xds.resource.listening_address=127.0.0.1:36251] version "" with version "1"
        xds_server_integration_test.go:59: Serving mode for listener "127.0.0.1:36251" changed to "SERVING", err: <nil>
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.listener.v3.Listener[grpc/server?xds.resource.listening_address=127.0.0.1:34579 grpc/server?xds.resource.listening_address=127.0.0.1:36251] and known map[grpc/server?xds.resource.listening_address=127.0.0.1:34579:{} grpc/server?xds.resource.listening_address=127.0.0.1:36251:{}]. Diff []
        logging.go:30: open watch 1 for type.googleapis.com/envoy.config.listener.v3.Listener[grpc/server?xds.resource.listening_address=127.0.0.1:34579 grpc/server?xds.resource.listening_address=127.0.0.1:36251] from nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf", version "1"
        tlogger.go:116: INFO clientconn.go:1319 [core] [Channel #1431 SubChannel #1432]Subchannel picks a new address "127.0.0.1:34293" to connect  (t=+47.[375](https://github.com/grpc/grpc-go/actions/runs/11697938683/job/32577429244?pr=7742#step:8:376)244ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #1431 SubChannel #1432]Subchannel Connectivity change to READY  (t=+49.391442ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #1431]Channel Connectivity change to READY  (t=+49.730753ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #1430]Channel exiting idle mode  (t=+51.166883ms)
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.listener.v3.Listener[my-service-fallback] and known map[]. Diff [my-service-fallback]
        logging.go:30: respond type.googleapis.com/envoy.config.listener.v3.Listener[my-service-fallback] version "" with version "1"
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.listener.v3.Listener[my-service-fallback] and known map[my-service-fallback:{}]. Diff []
        logging.go:30: open watch 2 for type.googleapis.com/envoy.config.listener.v3.Listener[my-service-fallback] from nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf", version "1"
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[route-my-service-fallback] and known map[]. Diff [route-my-service-fallback]
        logging.go:30: respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[route-my-service-fallback] version "" with version "1"
        tlogger.go:116: INFO xds_resolver.go:284 [xds] [xds-resolver 0x400088c360] For Listener resource "my-service-fallback" and RouteConfiguration resource "route-my-service-fallback", generated service config: {
              "loadBalancingConfig": [
                {
                  "xds_cluster_manager_experimental": {
                    "children": {
                      "cluster:cluster-my-service-fallback": {
              "MaxRequests": null,
              "DNSHostName": "",
              "PrioritizedClusterNames": null,
              "LBPolicy": [
                {
                  "xds_wrr_locality_experimental": {
                    "childPolicy": [
                      {
                        "round_robin": {}
                      }
                    ]
                  }
                }
              ],
              "OutlierDetection": null,
              "Raw": {
                "type_url": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
                "value": "ChtjbHVzdGVyLW15LXNlcnZpY2UtZmFsbGJhY2saIwoCGgASHWVuZHBvaW50cy1teS1zZXJ2aWNlLWZhbGxiYWNrEAM="
              },
              "TelemetryLabels": {
                "csm.service_name": "unknown",
                "csm.service_namespace_name": "unknown"
              }
            }  (t=+61.700011ms)
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster-my-service-fallback] and known map[cluster-my-service-fallback:{}]. Diff []
        logging.go:30: open watch 4 for type.googleapis.com/envoy.config.cluster.v3.Cluster[cluster-my-service-fallback] from nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf", version "1"
        tlogger.go:116: INFO clusterresolver.go:85 [xds] [xds-cluster-resolver-lb 0x4000830c40] Created  (t=+63.031449ms)
        tlogger.go:116: INFO cdsbalancer.go:455 [xds] [cds-lb 0x400042bd40] Created child policy 0x4000830c40 of type cluster_resolver_experimental  (t=+63.18286ms)
        xds_server_integration_test.go:225: rpc EmptyCall() failed: rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #1430]Channel Connectivity change to SHUTDOWN  (t=+10.012759875s)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #1430]Closing the name resolver  (t=+10.012846987s)
        tlogger.go:116: INFO balancer_wrapper.go:147 [core] [Channel #1430]ccBalancerWrapper: closing  (t=+10.012952052s)
        tlogger.go:116: INFO watch_service.go:56 [xds] [xds-resolver 0x400088c360] Canceling watch on Listener resource "my-service-fallback"  (t=+10.013212416s)
        tlogger.go:116: INFO watch_service.go:91 [xds] [xds-resolver 0x400088c360] Canceling watch on RouteConfiguration resource "route-my-service-fallback"  (t=+10.013544501s)
        tlogger.go:116: INFO clusterresolver.go:338 [xds] [xds-cluster-resolver-lb 0x4000830c40] Shutdown  (t=+10.013833187s)
        tlogger.go:116: INFO cdsbalancer.go:[380](https://github.com/grpc/grpc-go/actions/runs/11697938683/job/32577429244?pr=7742#step:8:381) [xds] [cds-lb 0x400042bd40] Shutdown  (t=+10.013900122s)
        tlogger.go:116: INFO clustermanager.go:195 [xds] [xds-cluster-manager-lb 0x40007175e0] Shutdown  (t=+10.013962618s)
        logging.go:30: nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[cluster-my-service-fallback:{}]. Diff []
        logging.go:30: open watch 5 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "06bfc2e9-2525-42d0-a815-1e074b98dabf", version "1"
        tlogger.go:116: WARNING ads_stream.go:514 [xds] [xds-client 0x4000b89080] [xds-channel 0x40009b5440] [ads-stream 0x4000ab4900] ADS stream closed: rpc error: code = Canceled desc = context canceled  (t=+10.014077852s)
easwars commented 6 days ago

Can't reproduce this locally or in forge over 10K runs. And the failing log doesn't give enough information about why the failure happened.

arjan-bal commented 6 days ago

I saw it fail only once. We can wait for a couple of weeks to see if it's worth investigating further.