grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
21.11k stars 4.39k forks source link

Flaky test: TimerAndWatchStateOnErrorCallback #7321

Open easwars opened 5 months ago

easwars commented 5 months ago
--- FAIL: Test (1.56s)
    --- FAIL: Test/TimerAndWatchStateOnErrorCallback (0.03s)
        tlogger.go:116: INFO server.go:132 [xds-e2e] Created new snapshot cache...  (t=+252.631µs)
        tlogger.go:116: INFO server.go:685 [core] [Server #44]Server created  (t=+1.247297ms)
        tlogger.go:116: INFO server.go:158 [xds-e2e] Registered Aggregated Discovery Service (ADS)...  (t=+3.857605ms)
        tlogger.go:116: INFO server.go:177 [xds-e2e] xDS management server serving at: 127.0.0.1:4[233](https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320#step:8:234)7...  (t=+4.007795ms)
        tlogger.go:116: INFO server.go:881 [core] [Server #44 ListenSocket #45]ListenSocket created  (t=+6.05881ms)
        tlogger.go:116: INFO bootstrap.go:569 [xds] [xds-bootstrap] Bootstrap config for creating xds-client: &{xDSServers:[0x40007a07e0] cpcs:map[client-side-certificate-provider-instance:{PluginName:file_watcher Config:[123 10 34 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 99 101 114 116 46 112 101 109 34 44 10 34 112 114 105 118 97 116 101 95 107 101 121 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 107 101 121 46 112 101 109 34 44 10 34 99 97 95 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 99 97 46 112 101 109 34 44 10 34 114 101 102 114 101 115 104 95 105 110 116 101 114 118 97 108 34 58 32 34 54 48 48 115 34 10 125]} server-side-certificate-provider-instance:{PluginName:file_watcher Config:[123 10 34 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 99 101 114 116 46 112 101 109 34 44 10 34 112 114 105 118 97 116 101 95 107 101 121 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 107 101 121 46 112 101 109 34 44 10 34 99 97 95 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 99 97 46 112 101 109 34 44 10 34 114 101 102 114 101 115 104 95 105 110 116 101 114 118 97 108 34 58 32 34 54 48 48 115 34 10 125]}] serverListenerResourceNameTemplate:grpc/server?xds.resource.listening_address=%s clientDefaultListenerResourceNameTemplate:%s authorities:map[:0x4000389020] node:{ID:e4644a06-8290-4722-a2ac-94ac68668693 Cluster: Locality:{Region: Zone: SubZone:} Metadata:<nil> userAgentName:gRPC Go userAgentVersionType:{UserAgentVersion:1.66.0-dev} clientFeatures:[envoy.lb.does_not_support_overprovisioning xds.config.resource-in-sotw]} certProviderConfigs:map[client-side-certificate-provider-instance:0x400078ee40 server-side-certificate-provider-instance:0x400078ef00]}  (t=+7.663132ms)
        tlogger.go:116: INFO clientconn.go:1684 [core] original dial target is: "127.0.0.1:42337"  (t=+8.068748ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel created  (t=+8.287637ms)
        tlogger.go:116: INFO clientconn.go:188 [core] [Channel #46]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"dns", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/127.0.0.1:42337", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+8.489964ms)
        tlogger.go:116: INFO clientconn.go:189 [core] [Channel #46]Channel authority set to "127.0.0.1:42337"  (t=+8.653599ms)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #46]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:42337",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:42337",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+9.289575ms)
        tlogger.go:116: INFO balancer_wrapper.go:103 [core] [Channel #46]Channel switches to new LB policy "pick_first"  (t=+9.597169ms)
        tlogger.go:116: INFO clientconn.go:849 [core] [Channel #46 SubChannel #47]Subchannel created  (t=+9.872452ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to CONNECTING  (t=+10.032371ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to CONNECTING  (t=+10.442045ms)
        tlogger.go:116: INFO clientconn.go:1326 [core] [Channel #46 SubChannel #47]Subchannel picks a new address "127.0.0.1:42337" to connect  (t=+10.736173ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to READY  (t=+13.151778ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to READY  (t=+13.39[236](https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320#step:8:237)7ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel exiting idle mode  (t=+13.568766ms)
        tlogger.go:116: INFO transport.go:238 Created transport to server "127.0.0.1:42337"  (t=+14.91149ms)
        tlogger.go:116: INFO transport.go:337 ADS stream created  (t=+15.146929ms)
        tlogger.go:116: INFO simple.go:441 [xds-e2e] open watch 1 for type.googleapis.com/envoy.config.listener.v3.Listener[xdsclient-test-lds-resource] from nodeID "e4644a06-8290-4722-a2ac-94ac68668693", version ""  (t=+16.170659ms)
        tlogger.go:116: WARNING transport.go:479 ADS stream closed: EOF  (t=+18.954451ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #44 ListenSocket #45]ListenSocket deleted  (t=+19.525345ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to IDLE  (t=+20.375501ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to IDLE  (t=+20.759577ms)
        authority_test.go:170: Got 1. Want: 0
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to SHUTDOWN  (t=+22.0211ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #46]Closing the name resolver  (t=+22.220993ms)
        tlogger.go:116: INFO balancer_wrapper.go:135 [core] [Channel #46]ccBalancerWrapper: closing  (t=+22.357758ms)
        tlogger.go:116: WARNING transport.go:412 Sending ADS request for resources: [], url: "type.googleapis.com/envoy.config.listener.v3.Listener", version: "", nonce: "" failed: EOF  (t=+22.578209ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to SHUTDOWN  (t=+22.757052ms)
        tlogger.go:116: INFO clientconn.go:1557 [core] [Channel #46 SubChannel #47]Subchannel deleted  (t=+22.910058ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel deleted  (t=+23.020063ms)
FAIL
FAIL    google.golang.org/grpc/xds/internal/xdsclient   5.375s

https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320

SpiffyEight77 commented 4 months ago

Hey @easwars, I'd like to work on this issue. Could you please assign it to me?

SpiffyEight77 commented 4 months ago

@easwars cc @aranjans Sorry to bother you at this moment.

This is my first time addressing this type of issue, and it seems difficult to reproduce locally. I've run the following commands; am I missing anything?

Details

$ go version
go version go1.22.4 darwin/arm64

$ pwd
~/Developer/SpiffyEight77/grpc-go/xds/internal/xdsclient

$ go test -race -count=100000 -failfast -run ^Test/TimerAndWatchStateOnErrorCallback$
PASS
ok      google.golang.org/grpc/xds/internal/xdsclient   429.233s
purnesh42H commented 4 months ago

@SpiffyEight77 thanks for doing go test -race. Yeah flaky tests are hard to repro. We will add more instances to this issue if encounter any in future. Meanwhile if you want you can try to trace the code to figure the root cause and fix if any

SpiffyEight77 commented 4 months ago

@SpiffyEight77 thanks for doing go test -race. Yeah flaky tests are hard to repro. We will add more instances to this issue if encounter any in future. Meanwhile if you want you can try to trace the code to figure the root cause and fix if any

@purnesh42H Thank you so much for your reply, I'll take time to read the code and figure the root cause and fix it.

purnesh42H commented 3 months ago

@SpiffyEight77 just checking if you are still working on it

SpiffyEight77 commented 3 months ago

@purnesh42H Apologies for my delayed response. I haven't been working on this issue recently.

purnesh42H commented 1 month ago

https://github.com/grpc/grpc-go/actions/runs/11366793152/job/31617975424?pr=7677

arjan-bal commented 3 weeks ago

https://github.com/grpc/grpc-go/actions/runs/11494697596/job/31992676274?pr=7751