grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
21.07k stars 4.38k forks source link

Flaky test: TestServeAndCloseDoNotRace #7627

Open arvindbr8 opened 1 month ago

arvindbr8 commented 1 month ago

pkg xds

link: https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619

--- FAIL: Test (0.59s)
    --- FAIL: Test/ServeAndCloseDoNotRace (0.25s)
    ...
    <removed logs>
    ...

        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a2140] Created xds.GRPCServer  (t=+153.588724ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #95]Server created  (t=+154.986653ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a2480] Created xds.GRPCServer  (t=+155.563565ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #96]Server created  (t=+156.83574ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a2800] Created xds.GRPCServer  (t=+157.410739ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #97]Server created  (t=+158.692692ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a2b40] Created xds.GRPCServer  (t=+159.290173ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #98]Server created  (t=+160.656623ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a2e80] Created xds.GRPCServer  (t=+161.232794ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #99]Server created  (t=+162.553039ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a31c0] Created xds.GRPCServer  (t=+163.130632ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0000fbd00] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+163.626253ms)
        tlogger.go:116: INFO clientconn.go:1675 [core] original dial target is: "passthrough:///non-existent-management-server"  (t=+163.79105ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #100]Channel created  (t=+163.858686ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #100]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/non-existent-management-server", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+163.948372ms)
        tlogger.go:116: INFO clientconn.go:194 [core] [Channel #100]Channel authority set to "non-existent-management-server"  (t=+164.007292ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a2140] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.16227ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a2800] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.236809ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a2b40] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.347354ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a2480] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.421442ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a2e80] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.495059ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a31c0] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+164.58166ms)
        tlogger.go:116: INFO resolver_wrapper.go:200 [core] [Channel #100]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "non-existent-management-server",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "non-existent-management-server",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+164.785068ms)
        tlogger.go:116: INFO balancer_wrapper.go:107 [core] [Channel #100]Channel switches to new LB policy "pick_first"  (t=+164.88831ms)
        tlogger.go:116: INFO clientconn.go:842 [core] [Channel #100 SubChannel #101]Subchannel created  (t=+164.977095ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #100]Channel Connectivity change to CONNECTING  (t=+165.056853ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #100]Channel exiting idle mode  (t=+165.136331ms)
        tlogger.go:116: INFO transport.go:245 [xds] [xds-client 0xc000958000] [passthrough:///non-existent-management-server] Created transport to server "passthrough:///non-existent-management-server"  (t=+165.210659ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #100 SubChannel #101]Subchannel Connectivity change to CONNECTING  (t=+165.348806ms)
        tlogger.go:116: INFO clientconn.go:1317 [core] [Channel #100 SubChannel #101]Subchannel picks a new address "non-existent-management-server" to connect  (t=+165.446858ms)
        tlogger.go:116: WARNING clientconn.go:1379 [core] [Channel #100 SubChannel #101]grpc: addrConn.createTransport failed to connect to {Addr: "non-existent-management-server", ServerName: "non-existent-management-server", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+165.605563ms)
        tlogger.go:116: INFO clientconn.go:1201 [core] [Channel #100 SubChannel #101]Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+165.686534ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #100]Channel Connectivity change to TRANSIENT_FAILURE  (t=+165.805425ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+165.865877ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000542340] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+165.957588ms)
        tlogger.go:116: WARNING transport.go:338 [xds] [xds-client 0xc000958000] [passthrough:///non-existent-management-server] Creating new ADS stream failed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+166.135138ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+166.202814ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000542410] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+166.267524ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+166.320803ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000542750] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+166.398217ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #100]Channel Connectivity change to SHUTDOWN  (t=+166.493464ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #100]Closing the name resolver  (t=+166.546583ms)
        tlogger.go:116: INFO balancer_wrapper.go:145 [core] [Channel #100]ccBalancerWrapper: closing  (t=+166.607386ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #100 SubChannel #101]Subchannel Connectivity change to SHUTDOWN  (t=+166.777602ms)
        tlogger.go:116: INFO clientconn.go:1548 [core] [Channel #100 SubChannel #101]Subchannel deleted  (t=+166.845288ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #100]Channel deleted  (t=+166.893928ms)
        tlogger.go:116: INFO clientimpl.go:100 [xds] [xds-client 0xc000958000] Shutdown  (t=+166.955433ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #102]Server created  (t=+167.959569ms)
        tlogger.go:116: INFO client_refcounted.go:81 [xds] [xds-client 0xc000958f00] Created client with name "#server" and bootstrap configuration:
             {
              "xds_servers": [
               {
                "server_uri": "passthrough:///non-existent-management-server",
                "channel_creds": [
                 {
                  "type": "insecure"
                 }
                ]
               }
              ],
              "certificate_providers": {
               "client-side-certificate-provider-instance": {
                "plugin_name": "file_watcher",
                "config": {
                 "certificate_file": "/tmp/testClientSideXDS4106081044/cert.pem",
                 "private_key_file": "/tmp/testClientSideXDS4106081044/key.pem",
                 "ca_certificate_file": "/tmp/testClientSideXDS4106081044/ca.pem",
                 "refresh_interval": "600s"
                }
               },
               "server-side-certificate-provider-instance": {
                "plugin_name": "file_watcher",
                "config": {
                 "certificate_file": "/tmp/testServerSideXDS3856636603/cert.pem",
                 "private_key_file": "/tmp/testServerSideXDS3856636603/key.pem",
                 "ca_certificate_file": "/tmp/testServerSideXDS3856636603/ca.pem",
                 "refresh_interval": "600s"
                }
               }
              },
              "server_listener_resource_name_template": "grpc/server?xds.resource.listening_address=%s",
              "client_default_listener_resource_name_template": "%s",
              "node": {
               "id": "f692b8eb-2144-4204-9b6e-a0b079[198](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:199)c0f",
               "locality": {}
              }
             }  (t=+168.843673ms)
        tlogger.go:116: INFO client_refcounted.go:86 [xds] xDS node ID: f692b8eb-2144-4204-9b6e-a0b079198c0f  (t=+168.937157ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a3900] Created xds.GRPCServer  (t=+168.986118ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #103]Server created  (t=+170.348851ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0005a3e80] Created xds.GRPCServer  (t=+171.002346ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #104]Server created  (t=+172.364278ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084c3c0] Created xds.GRPCServer  (t=+172.949345ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #105]Server created  (t=+174.305587ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084c780] Created xds.GRPCServer  (t=+174.912024ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #106]Server created  (t=+176.248178ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084cb80] Created xds.GRPCServer  (t=+176.842713ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #107]Server created  (t=+178.18602ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084cb80] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+178.538345ms)
        tlogger.go:116: INFO clientconn.go:1675 [core] original dial target is: "passthrough:///non-existent-management-server"  (t=+178.729711ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #108]Channel created  (t=+178.815891ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #108]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/non-existent-management-server", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+178.899617ms)
        tlogger.go:116: INFO clientconn.go:194 [core] [Channel #108]Channel authority set to "non-existent-management-server"  (t=+178.949179ms)
        tlogger.go:116: INFO resolver_wrapper.go:200 [core] [Channel #108]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "non-existent-management-server",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "non-existent-management-server",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+179.203322ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a3900] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+179.311122ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0005a3e80] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+179.424132ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084c3c0] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+179.521463ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084c780] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+179.634142ms)
        tlogger.go:116: INFO balancer_wrapper.go:107 [core] [Channel #108]Channel switches to new LB policy "pick_first"  (t=+179.747042ms)
        tlogger.go:116: INFO clientconn.go:842 [core] [Channel #108 SubChannel #109]Subchannel created  (t=+179.830968ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #108]Channel Connectivity change to CONNECTING  (t=+179.897382ms)
        tlogger.go:116: INFO clientconn.go:1[199](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:200) [core] [Channel #108 SubChannel #109]Subchannel Connectivity change to CONNECTING  (t=+179.982059ms)
        tlogger.go:116: INFO clientconn.go:1317 [core] [Channel #108 SubChannel #109]Subchannel picks a new address "non-existent-management-server" to connect  (t=+180.040157ms)
        tlogger.go:116: WARNING clientconn.go:1379 [core] [Channel #108 SubChannel #109]grpc: addrConn.createTransport failed to connect to {Addr: "non-existent-management-server", ServerName: "non-existent-management-server", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+180.190967ms)
        tlogger.go:116: INFO clientconn.go:1201 [core] [Channel #108 SubChannel #109]Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+180.2731ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #108]Channel exiting idle mode  (t=+180.333663ms)
        tlogger.go:116: INFO transport.go:245 [xds] [xds-client 0xc000958f00] [passthrough:///non-existent-management-server] Created transport to server "passthrough:///non-existent-management-server"  (t=+180.408622ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+180.507997ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc0005c12b0] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+180.586693ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #108]Channel Connectivity change to TRANSIENT_FAILURE  (t=+180.672132ms)
        tlogger.go:116: WARNING transport.go:338 [xds] [xds-client 0xc000958f00] [passthrough:///non-existent-management-server] Creating new ADS stream failed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+180.845464ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+180.900867ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc0005c1450] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+180.970897ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+181.040146ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc0005c1860] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+181.110527ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #108]Channel Connectivity change to SHUTDOWN  (t=+181.188853ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #108]Closing the name resolver  (t=+181.242372ms)
        tlogger.go:116: INFO balancer_wrapper.go:145 [core] [Channel #108]ccBalancerWrapper: closing  (t=+181.30061ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #108 SubChannel #109]Subchannel Connectivity change to SHUTDOWN  (t=+181.389045ms)
        tlogger.go:116: INFO clientconn.go:1548 [core] [Channel #108 SubChannel #109]Subchannel deleted  (t=+181.450208ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #108]Channel deleted  (t=+181.497296ms)
        tlogger.go:116: INFO clientimpl.go:100 [xds] [xds-client 0xc000958f00] Shutdown  (t=+181.556676ms)
        tlogger.go:116: INFO client_refcounted.go:81 [xds] [xds-client 0xc000959c20] Created client with name "#server" and bootstrap configuration:
             {
              "xds_servers": [
               {
                "server_uri": "passthrough:///non-existent-management-server",
                "channel_creds": [
                 {
                  "type": "insecure"
                 }
                ]
               }
              ],
              "certificate_providers": {
               "client-side-certificate-provider-instance": {
                "plugin_name": "file_watcher",
                "config": {
                 "certificate_file": "/tmp/testClientSideXDS808658363/cert.pem",
                 "private_key_file": "/tmp/testClientSideXDS808658363/key.pem",
                 "ca_certificate_file": "/tmp/testClientSideXDS808658363/ca.pem",
                 "refresh_interval": "600s"
                }
               },
               "server-side-certificate-provider-instance": {
                "plugin_name": "file_watcher",
                "config": {
                 "certificate_file": "/tmp/testServerSideXDS1611495745/cert.pem",
                 "private_key_file": "/tmp/testServerSideXDS1611495745/key.pem",
                 "ca_certificate_file": "/tmp/testServerSideXDS1611495745/ca.pem",
                 "refresh_interval": "600s"
                }
               }
              },
              "server_listener_resource_name_template": "grpc/server?xds.resource.listening_address=%s",
              "client_default_listener_resource_name_template": "%s",
              "node": {
               "id": "21640e10-c9a5-4c71-bf99-18be00c73a02",
               "locality": {}
              }
             }  (t=+182.092522ms)
        tlogger.go:116: INFO client_refcounted.go:86 [xds] xDS node ID: 21640e10-c9a5-4c71-bf99-18be00c73a02  (t=+182.184403ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084cfc0] Created xds.GRPCServer  (t=+182.231ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #110]Server created  (t=+184.224175ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084d740] Created xds.GRPCServer  (t=+184.866199ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #111]Server created  (t=+186.307268ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00084db00] Created xds.GRPCServer  (t=+187.049297ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #112]Server created  (t=+188.630367ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063a000] Created xds.GRPCServer  (t=+189.339425ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #113]Server created  (t=+191.184725ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063a380] Created xds.GRPCServer  (t=+191.834824ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #114]Server created  (t=+193.439156ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063a780] Created xds.GRPCServer  (t=+194.14588ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #115]Server created  (t=+195.76[200](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:201)4ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063ab00] Created xds.GRPCServer  (t=+196.451546ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #116]Server created  (t=+197.996438ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063ae00] Created xds.GRPCServer  (t=+198.627171ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #117]Server created  (t=+203.682278ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063b140] Created xds.GRPCServer  (t=+204.306198ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #118]Server created  (t=+205.675795ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063b480] Created xds.GRPCServer  (t=+206.286069ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #119]Server created  (t=+207.731576ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063b840] Created xds.GRPCServer  (t=+208.322855ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #120]Server created  (t=+209.698232ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc00063bc00] Created xds.GRPCServer  (t=+210.296013ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #121]Server created  (t=+211.756408ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c4040] Created xds.GRPCServer  (t=+212.320757ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #122]Server created  (t=+213.720269ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c43c0] Created xds.GRPCServer  (t=+214.481204ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #123]Server created  (t=+215.977976ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c4700] Created xds.GRPCServer  (t=+216.614299ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #124]Server created  (t=+217.931728ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084cfc0] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+226.548868ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063ae00] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+226.638074ms)
        tlogger.go:116: INFO clientconn.go:1675 [core] original dial target is: "passthrough:///non-existent-management-server"  (t=+226.847954ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #125]Channel created  (t=+226.913886ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #125]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/non-existent-management-server", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+227.006839ms)
        tlogger.go:116: INFO clientconn.go:194 [core] [Channel #125]Channel authority set to "non-existent-management-server"  (t=+227.076769ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c4a40] Created xds.GRPCServer  (t=+227.41587ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #126]Server created  (t=+228.817094ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c4dc0] Created xds.GRPCServer  (t=+229.414144ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #127]Server created  (t=+230.860673ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c5180] Created xds.GRPCServer  (t=+231.453896ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #128]Server created  (t=+232.817932ms)
        tlogger.go:116: INFO server.go:123 [xds] [xds-server 0xc0008c5500] Created xds.GRPCServer  (t=+233.4032ms)
        tlogger.go:116: INFO server.go:684 [core] [Server #129]Server created  (t=+240.445061ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063b140] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+242.31556ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063b480] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+242.829926ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063b840] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+242.972712ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c4040] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.102112ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063bc00] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.216846ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c43c0] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.475967ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c4700] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.543633ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084d740] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.590981ms)
        tlogger.go:116: INFO resolver_wrapper.go:200 [core] [Channel #125]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "non-existent-management-server",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "non-existent-management-server",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+243.865722ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c4dc0] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+243.969625ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c4a40] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.051377ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c5180] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.135654ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc0008c5500] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.226132ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00084db00] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.268801ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063a000] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.314116ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063a780] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+244.360241ms)
        tlogger.go:116: INFO balancer_wrapper.go:107 [core] [Channel #125]Channel switches to new LB policy "pick_first"  (t=+244.444278ms)
        tlogger.go:116: INFO clientconn.go:842 [core] [Channel #125 SubChannel #130]Subchannel created  (t=+244.53698ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #125]Channel Connectivity change to CONNECTING  (t=+244.611508ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #125 SubChannel #130]Subchannel Connectivity change to CONNECTING  (t=+244.701496ms)
        tlogger.go:116: INFO clientconn.go:1317 [core] [Channel #125 SubChannel #130]Subchannel picks a new address "non-existent-management-server" to connect  (t=+244.781825ms)
        tlogger.go:116: WARNING clientconn.go:1379 [core] [Channel #125 SubChannel #130]grpc: addrConn.createTransport failed to connect to {Addr: "non-existent-management-server", ServerName: "non-existent-management-server", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+244.963403ms)
        tlogger.go:116: INFO clientconn.go:1[201](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:202) [core] [Channel #125 SubChannel #130]Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+245.060553ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063a380] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+245.105537ms)
        tlogger.go:116: INFO server.go:182 [xds] [xds-server 0xc00063ab00] Serve() passed a net.Listener on 127.0.0.1:35925  (t=+245.154818ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #125]Channel Connectivity change to TRANSIENT_FAILURE  (t=+245.255596ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #125]Channel exiting idle mode  (t=+245.310217ms)
        tlogger.go:116: INFO transport.go:245 [xds] [xds-client 0xc000959c20] [passthrough:///non-existent-management-server] Created transport to server "passthrough:///non-existent-management-server"  (t=+245.390506ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+245.50021ms)
        tlogger.go:116: WARNING transport.go:338 [xds] [xds-client 0xc000959c20] [passthrough:///non-existent-management-server] Creating new ADS stream failed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: address non-existent-management-server: missing port in address"  (t=+245.706634ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+245.763941ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+245.834883ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+245.897118ms)
        tlogger.go:116: WARNING clientimpl_watchers.go:56 [xds] Watch registered for name "ListenerResource" of type "grpc/server?xds.resource.listening_address=127.0.0.1:35925", authority "" is not found  (t=+245.972398ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc0001f7ba0] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+246.060612ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc0001f7d40] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+246.1[213](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:214)55ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000198270] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+246.166619ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000198a90] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+246.213406ms)
        tlogger.go:116: WARNING listener_wrapper.go:428 [xds] [xds-server-listener 0xc000198f70] Resource "grpc/server?xds.resource.listening_address=127.0.0.1:35925" received error: the xds-client is closed after listener was closed  (t=+246.254753ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #125]Channel Connectivity change to SHUTDOWN  (t=+246.346894ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #125]Closing the name resolver  (t=+246.394603ms)
        tlogger.go:116: INFO balancer_wrapper.go:145 [core] [Channel #125]ccBalancerWrapper: closing  (t=+246.451689ms)
        tlogger.go:116: INFO clientconn.go:1199 [core] [Channel #125 SubChannel #130]Subchannel Connectivity change to SHUTDOWN  (t=+246.540635ms)
        tlogger.go:116: INFO clientconn.go:1548 [core] [Channel #125 SubChannel #130]Subchannel deleted  (t=+246.606096ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #125]Channel deleted  (t=+246.7[217](https://github.com/grpc/grpc-go/actions/runs/10810537881/job/29988003566?pr=7619#step:8:218)81ms)
        tlogger.go:116: INFO clientimpl.go:100 [xds] [xds-client 0xc000959c20] Shutdown  (t=+246.785069ms)
        server_test.go:703: Failed to create an xDS enabled gRPC server: xDS client creation failed: xds: failed to get xDS bootstrap config: bootstrap environment variables ("GRPC_XDS_BOOTSTRAP" or "GRPC_XDS_BOOTSTRAP_CONFIG") not defined, and no fallback config set
FAIL
purnesh42H commented 1 month ago

@arvindbr8 looks like no report from 2 weeks. Should we downgrade to P2?

dfawley commented 1 month ago

We should measure the flakiness w/10k runs and assign priority accordingly. Also consider new-ness -- if the test is new we should treat with much higher priority since we don't generally want to allow flakiness.

purnesh42H commented 3 weeks ago

no failures in 10k forge runs

arjan-bal commented 2 weeks ago

https://github.com/grpc/grpc-go/actions/runs/11397970463/job/31714272864?pr=7754

aranjans commented 2 weeks ago

FAILED in 106 out of 100000 http://fusion2/49630097-747f-425d-b94e-3d6519c01e67

zasweq commented 2 weeks ago

Taking a look now, but I see that this test has flaked in G/A twice over a year and a third of it being merged, so I don't think it's super urgent. The fact it only showed up this year Sept makes me think some bootstrap parsing code changed, as that's how it specifies to create a new server. Weird that that would induce flakiness though, because the waitgroup doesn't end until after that processes. Maybe there's a race between setting bootstrap and server actually starting.

dfawley commented 2 weeks ago

100/100k = 1/1000 which is pretty low, but taking a look to see if we can find the root cause would be worth it.

dfawley commented 5 days ago

Flaked 2x on my PR in a single snapshot:

https://github.com/grpc/grpc-go/actions/runs/11634120790/job/32400709681?pr=7798 https://github.com/grpc/grpc-go/actions/runs/11634120790/job/32400708629?pr=7798

zasweq commented 2 days ago

https://github.com/grpc/grpc-go/actions/runs/11636054765/job/32406682904?pr=7797

easwars commented 2 days ago

https://github.com/grpc/grpc-go/actions/runs/11674461177/job/32507200078?pr=7804

zasweq commented 2 days ago

Will take a look at this this week once I get some time looks like it's flaking a lot more now.

zasweq commented 2 days ago

The fact the first occurence of this occurred sept of this year, over a year after it was merged makes me think it's racing with some of the xDS Client changes for fallback with respect to testing plumbing of xDS Bootstrap.

easwars commented 2 days ago

I'll take a look at this tomorrow.

arjan-bal commented 1 day ago

Failure: https://github.com/grpc/grpc-go/actions/runs/11689787872/job/32553307641

Stack traces:

panic: test timed out after 7m0s
    running tests:
        Test (7m0s)
        Test/ServeAndCloseDoNotRace (7m0s)

goroutine 369 [running]:
testing.(*M).startAlarm.func1()
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2373 +0x265
created by time.goFunc
    /opt/hostedtoolcache/go/1.23.2/x64/src/time/sleep.go:215 +0x45

goroutine 1 [chan receive, 7 minutes]:
testing.(*T).Run(0xc00015a4e0, {0x17b7e62, 0x4}, 0x181c398)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1751 +0x851
testing.runTests.func1(0xc00015a4e0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:[216](https://github.com/grpc/grpc-go/actions/runs/11689787872/job/32553307641#step:8:217)8 +0x86
testing.tRunner(0xc00015a4e0, 0xc00006bae0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1690 +0x227
testing.runTests(0xc00012c2d0, {0x235b860, 0x2, 0x2}, {0x7fe8fec5c108?, 0x40?, 0x237b1c0?})
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2166 +0x8bf
testing.(*M).Run(0xc000320000)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:2034 +0xf18
main.main()
    _testmain.go:49 +0x165

goroutine 4 [chan receive, 6 minutes]:
testing.(*T).Run(0xc00015a680, {0x14d6b6f, 0x16}, 0xc00019e440)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1751 +0x851
google.golang.org/grpc/internal/grpctest.RunSubTests(0xc00015a680, {0x195c4e0, 0x239c600})
    /home/runner/work/grpc-go/grpc-go/internal/grpctest/grpctest.go:114 +0x352
google.golang.org/grpc/xds.Test(0xc00015a680)
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:65 +0x35
testing.tRunner(0xc00015a680, 0x181c398)
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1690 +0x[227](https://github.com/grpc/grpc-go/actions/runs/11689787872/job/32553307641#step:8:228)
created by testing.(*T).Run in goroutine 1
    /opt/hostedtoolcache/go/1.23.2/x64/src/testing/testing.go:1743 +0x826

goroutine 119 [semacquire, 6 minutes]:
sync.runtime_Semacquire(0xc00034a2a8?)
    /opt/hostedtoolcache/go/1.23.2/x64/src/runtime/sema.go:71 +0x25
sync.(*WaitGroup).Wait(0xc00034a2a0)
    /opt/hostedtoolcache/go/1.23.2/x64/src/sync/waitgroup.go:118 +0xa5
google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace({{}}, 0xc00036a000)
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:715 +0x54b
google.golang.org/grpc/internal/grpctest.RunSubTests.func1(0xc00036a000)
    /home/runner/work/grpc-go/grpc-go/internal/grpctest/grpctest.go:122 +0x10e

goroutine 183 [chan receive, 6 minutes]:
google.golang.org/grpc/xds/internal/xdsclient.(*authority).watchResource(0xc0005d42a0, {0x19617a0, 0xc000827c50}, {0xc000059e80, 0x3a}, {0x195fa00, 0xc000862780})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/authority.go:483 +0x2ab
google.golang.org/grpc/xds/internal/xdsclient.(*clientImpl).WatchResource(0xc000536680, {0x19617a0, 0xc000827c50}, {0xc000059e80, 0x3a}, {0x195fa00, 0xc000862780})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/clientimpl_watchers.go:66 +0xae5
google.golang.org/grpc/xds/internal/xdsclient/xdsresource.WatchListener(...)
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/xdsresource/listener_resource_type.go:185
google.golang.org/grpc/xds/internal/server.NewListenerWrapper({{0x195f6b8, 0xc00036c100}, {0xc000059e80, 0x3a}, {0x7fe8b4dacfa0, 0xc00047dc70}, 0xc000862760})
    /home/runner/work/grpc-go/grpc-go/xds/internal/server/listener_wrapper.go:102 +0xc3d
google.golang.org/grpc/xds.(*GRPCServer).Serve(0xc000802b00, {0x195f6b8, 0xc00036c100})
    /home/runner/work/grpc-go/grpc-go/xds/server.go:201 +0x457
google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace.func1()
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:707 +0x50
created by google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace in goroutine 119
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:706 +0x47f

goroutine 181 [chan receive, 6 minutes]:
google.golang.org/grpc/xds/internal/xdsclient.(*authority).watchResource(0xc0005d42a0, {0x19617a0, 0xc000827b60}, {0xc000059e00, 0x3a}, {0x195fa00, 0xc0008626c0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/authority.go:483 +0x2ab
google.golang.org/grpc/xds/internal/xdsclient.(*clientImpl).WatchResource(0xc000536680, {0x19617a0, 0xc000827b60}, {0xc000059e00, 0x3a}, {0x195fa00, 0xc0008626c0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/clientimpl_watchers.go:66 +0xae5
google.golang.org/grpc/xds/internal/xdsclient/xdsresource.WatchListener(...)
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/xdsresource/listener_resource_type.go:185
google.golang.org/grpc/xds/internal/server.NewListenerWrapper({{0x195f6b8, 0xc00036c100}, {0xc000059e00, 0x3a}, {0x7fe8b4dacfa0, 0xc00047dc70}, 0xc0008626a0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/server/listener_wrapper.go:102 +0xc3d
google.golang.org/grpc/xds.(*GRPCServer).Serve(0xc000802780, {0x195f6b8, 0xc00036c100})
    /home/runner/work/grpc-go/grpc-go/xds/server.go:201 +0x457
google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace.func1()
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:707 +0x50
created by google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace in goroutine 119
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:706 +0x47f

goroutine 169 [chan receive, 6 minutes]:
google.golang.org/grpc/xds/internal/xdsclient.(*authority).watchResource(0xc0005d42a0, {0x19617a0, 0xc000827740}, {0xc000059b80, 0x3a}, {0x195fa00, 0xc0008622a0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/authority.go:483 +0x2ab
google.golang.org/grpc/xds/internal/xdsclient.(*clientImpl).WatchResource(0xc000536680, {0x19617a0, 0xc000827740}, {0xc000059b80, 0x3a}, {0x195fa00, 0xc0008622a0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/clientimpl_watchers.go:66 +0xae5
google.golang.org/grpc/xds/internal/xdsclient/xdsresource.WatchListener(...)
    /home/runner/work/grpc-go/grpc-go/xds/internal/xdsclient/xdsresource/listener_resource_type.go:185
google.golang.org/grpc/xds/internal/server.NewListenerWrapper({{0x195f6b8, 0xc00036c100}, {0xc000059b80, 0x3a}, {0x7fe8b4dacfa0, 0xc00047dc70}, 0xc00086[228](https://github.com/grpc/grpc-go/actions/runs/11689787872/job/32553307641#step:8:229)0})
    /home/runner/work/grpc-go/grpc-go/xds/internal/server/listener_wrapper.go:102 +0xc3d
google.golang.org/grpc/xds.(*GRPCServer).Serve(0xc000595200, {0x195f6b8, 0xc00036c100})
    /home/runner/work/grpc-go/grpc-go/xds/server.go:201 +0x457
google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace.func1()
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:707 +0x50
created by google.golang.org/grpc/xds.s.TestServeAndCloseDoNotRace in goroutine 119
    /home/runner/work/grpc-go/grpc-go/xds/server_test.go:706 +0x47f
arjan-bal commented 1 day ago

A bunch of goroutines are stuck on authority.go:483 which reads from a channel: https://github.com/grpc/grpc-go/blob/0ec8fd84fdfb54f1b7f9c2d3d22aa20cd7a8cf09/xds/internal/xdsclient/authority.go#L483-L484

The change was introduced in https://github.com/grpc/grpc-go/pull/7773.

arjan-bal commented 1 day ago

@easwars maybe we need to close the channel if the callback can't be scheduled because the serializer is closed? https://github.com/grpc/grpc-go/blob/0ec8fd84fdfb54f1b7f9c2d3d22aa20cd7a8cf09/xds/internal/xdsclient/authority.go#L430-L436

easwars commented 1 day ago

I think you are right. Let me send a PR. I also see another place which looks very similar to this.

zasweq commented 1 day ago

Oh yeah makes sense good find Arjan.

easwars commented 1 day ago

Closing the done channel when unable to schedule the serializer callback fixes the failures where it takes 7m (the configured test timeout) to fail.

I'm now seeing another flake though:

grpc/xds/server_test.go:704: Failed to create an xDS enabled gRPC server: xDS client creation failed: xds: failed to get xDS bootstrap config: bootstrap environment variables ("GRPC_XDS_BOOTSTRAP" or "GRPC_XDS_BOOTSTRAP_CONFIG") not defined, and no fallback config set

This is weird and I'm trying to get to the bottom of this now.

easwars commented 1 day ago

Ok, I see what is going on:

When the server is created with a server option for bootstrap config, the server calls xdsclient.NewForTesting() to create an xDS client instead of calling xds.New(), which is what it would do in production code.

The xdsclient.NewForTesting(), calls bootstrap.SetFallbackBootstrapConfig to set the fallback bootstrap config to the one passed by the test. And it returns a cancel func which calls bootstrap.UnsetFallbackBootstrapConfigForTesting.

The above two operations basically race against each other, and a server creation could see that the boostrap config is unset because an previous iteration erased it.

I'm still exploring options for the fix.