grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
21.11k stars 4.39k forks source link

Flaky Test: Balancer_TwoAddresses_WeightExpiration #7507

Open easwars opened 3 months ago

easwars commented 3 months ago

https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506

--- FAIL: Test (6.19s)
    --- FAIL: Test/Balancer_TwoAddresses_WeightExpiration (1.19s)
        tlogger.go:116: INFO server.go:685 [core] [Server #328]Server created  (t=+1.018725ms)
        tlogger.go:116: INFO server.go:685 [core] [Server #330]Server created  (t=+1.759362ms)
        balancer_test.go:625: Marshaled service config: {"loadBalancingConfig": [ {"weighted_round_robin":{"enableOobLoadReport":true,"oobReportingPeriod":"60s","blackoutPeriod":"0s","weightExpirationPeriod":"60s","weightUpdatePeriod":".050s","errorUtilizationPenalty":0}} ] }
        tlogger.go:116: INFO clientconn.go:1678 [core] original dial target is: "whatever:///127.0.0.1:42173"  (t=+2.267587ms)
        tlogger.go:116: INFO server.go:881 [core] [Server #328 ListenSocket #329]ListenSocket created  (t=+2.984119ms)
        tlogger.go:116: INFO server.go:881 [core] [Server #330 ListenSocket #331]ListenSocket created  (t=+3.153685ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #332]Channel created  (t=+3.606116ms)
        tlogger.go:116: INFO clientconn.go:193 [core] [Channel #332]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"whatever", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/127.0.0.1:42173", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+3.86649ms)
        tlogger.go:116: INFO clientconn.go:194 [core] [Channel #332]Channel authority set to "127.0.0.1:42173"  (t=+4.005739ms)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #332]Channel exiting idle mode  (t=+4.413988ms)
        tlogger.go:116: INFO resolver_wrapper.go:200 [core] [Channel #332]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:42173",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:42173",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+4.704539ms)
        tlogger.go:116: INFO balancer_wrapper.go:107 [core] [Channel #332]Channel switches to new LB policy "weighted_round_robin"  (t=+4.983848ms)
        tlogger.go:116: INFO balancer.go:105 [weighted-round-robin] [0x40000b8460] Created  (t=+5.168582ms)
        tlogger.go:116: INFO balancer.go:166 [weighted-round-robin] [0x40000b8460] UpdateCCS: {{[{Addr: "127.0.0.1:42173", ServerName: "", }] [{[{Addr: "127.0.0.1:42173", ServerName: "", }] <nil>}] <nil> <nil>} 0x400043e400}  (t=+5.486323ms)
        tlogger.go:116: INFO clientconn.go:842 [core] [Channel #332 SubChannel #333]Subchannel created  (t=+5.655888ms)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #333]Subchannel Connectivity change to CONNECTING  (t=+6.037838ms)
        tlogger.go:116: INFO clientconn.go:1320 [core] [Channel #332 SubChannel #333]Subchannel picks a new address "127.0.0.1:42173" to connect  (t=+6.201913ms)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #333]Subchannel Connectivity change to READY  (t=+7.214086ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #332]Channel Connectivity change to CONNECTING  (t=+7.529052ms)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #332]Channel Connectivity change to READY  (t=+8.971524ms)
        tlogger.go:116: INFO resolver_wrapper.go:200 [core] [Channel #332]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:42173",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                },
                {
                  "Addr": "127.0.0.1:41821",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:42173",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                },
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:41821",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } ()  (t=+9.309473ms)
        tlogger.go:116: INFO balancer.go:166 [weighted-round-robin] [0x40000b8460] UpdateCCS: {{[{Addr: "127.0.0.1:42173", ServerName: "", } {Addr: "127.0.0.1:41821", ServerName: "", }] [{[{Addr: "127.0.0.1:42173", ServerName: "", }] <nil>} {[{Addr: "127.0.0.1:41821", ServerName: "", }] <nil>}] <nil> <nil>} 0x400043e400}  (t=+9.711981ms)
        tlogger.go:116: INFO clientconn.go:842 [core] [Channel #332 SubChannel #336]Subchannel created  (t=+9.874784ms)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #336]Subchannel Connectivity change to CONNECTING  (t=+34.939867ms)
        tlogger.go:116: INFO clientconn.go:1320 [core] [Channel #332 SubChannel #336]Subchannel picks a new address "127.0.0.1:41821" to connect  (t=+35.241417ms)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #336]Subchannel Connectivity change to READY  (t=+38.[113](https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506#step:8:114)778ms)
        balancer_test.go:637: Failed iteration 0; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 1; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 2; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 3; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 4; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 5; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 6; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 7; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 8; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed iteration 9; counts: map[127.0.0.1:41821:50 127.0.0.1:42173:50]; want [{srv:0x4000598780 w:9} {srv:0x4000598d50 w:90}]
        balancer_test.go:637: Failed to route RPCs with proper ratio
        tlogger.go:[116](https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506#step:8:117): INFO server.go:817 [core] [Server #330 ListenSocket #331]ListenSocket deleted  (t=+1.177759206s)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #336]Subchannel Connectivity change to IDLE  (t=+1.178468345s)
        tlogger.go:116: INFO clientconn.go:544 [core] [Channel #332]Channel Connectivity change to SHUTDOWN  (t=+1.178716236s)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #332]Closing the name resolver  (t=+1.178940063s)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #336]Subchannel Connectivity change to CONNECTING  (t=+1.179136017s)
        tlogger.go:116: INFO balancer_wrapper.go:145 [core] [Channel #332]ccBalancerWrapper: closing  (t=+1.179310612s)
        tlogger.go:116: INFO clientconn.go:1320 [core] [Channel #332 SubChannel #336]Subchannel picks a new address "127.0.0.1:41821" to connect  (t=+1.179727447s)
        tlogger.go:116: INFO clientconn.go:[120](https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506#step:8:121)2 [core] [Channel #332 SubChannel #333]Subchannel Connectivity change to SHUTDOWN  (t=+1.180289782s)
        tlogger.go:116: INFO clientconn.go:1551 [core] [Channel #332 SubChannel #333]Subchannel deleted  (t=+1.180489033s)
        tlogger.go:116: INFO clientconn.go:1202 [core] [Channel #332 SubChannel #336]Subchannel Connectivity change to SHUTDOWN  (t=+1.180963806s)
        tlogger.go:116: INFO clientconn.go:1551 [core] [Channel #332 SubChannel #336]Subchannel deleted  (t=+1.181167394s)
        tlogger.go:116: INFO clientconn.go:314 [core] [Channel #332]Channel deleted  (t=+1.181333333s)
        tlogger.go:116: INFO server.go:817 [core] [Server #328 ListenSocket #329]ListenSocket deleted  (t=+1.181606[121](https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506#step:8:122)s)
        tlogger.go:116: WARNING clientconn.go:1382 [core] [Channel #332 SubChannel #336]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:41821", ServerName: "127.0.0.1:42173", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [127](https://github.com/grpc/grpc-go/actions/runs/10377601227/job/28732125169?pr=7506#step:8:128).0.0.1:41821: operation was canceled"  (t=+1.184049915s)
FAIL
FAIL    google.golang.org/grpc/balancer/weightedroundrobin  11.550s
easwars commented 3 months ago

https://github.com/grpc/grpc-go/actions/runs/10394308167/job/28783810539?pr=7498

purnesh42H commented 3 months ago

https://github.com/grpc/grpc-go/actions/runs/10465483190/job/28980791860?pr=7462

easwars commented 2 months ago

https://github.com/grpc/grpc-go/actions/runs/10685201154/job/29617436939?pr=7579

dfawley commented 2 months ago

I believe this was supposed to be fixed by #7523. Maybe that wasn't effective enough?

zasweq commented 2 months ago

https://github.com/grpc/grpc-go/actions/runs/10855329479/job/30127706645?pr=7614 argh this shows up on a branch with the fix merged. So it still persists.

purnesh42H commented 2 months ago

@eshitachandwani to forge test and confirm the flakiness percentage

purnesh42H commented 2 months ago

https://github.com/grpc/grpc-go/actions/runs/10858951486/job/30137631363?pr=7609

eshitachandwani commented 2 months ago

Failed 0/100k (in all 7 iterations) test runs on forge

zasweq commented 3 weeks ago

https://github.com/grpc/grpc-go/actions/runs/11524690877/job/32085409887