Open easwars opened 2 years ago
The reason why this test is flaky is as follows:
IDLE
and it is only now that the monitoring goroutine gets to run, and it has already missed the first transition to READY
.In a production environment, this scenario is extremely unlikely to occur, and even if it does occur, the only problem is that the RLS LB policy will fail to reset backoff state for the first time that the control channel transitions back to READY
. Subsequent transitions will be handled properly.
Our current state change API is lossy because state changes can be lost between the former returning and the caller invoking GetState.
Blocked on https://github.com/grpc/grpc-go/issues/5818.
FAILED in 1 out of 10000 in 43.9s
Test error log:
``` --- FAIL: Test (9.95s) --- FAIL: Test/ControlChannelConnectivityStateMonitoring (5.02s) tlogger.go:116: INFO server.go:598 [core] [Server #525] Server created (t=+533.807µs) fake_rls_server.go:53: Started fake RLS server at "127.0.0.1:35475" balancer_test.go:763: Registered child policy with name "test-child-policyTest/ControlChannelConnectivityStateMonitoring" tlogger.go:116: INFO server.go:598 [core] [Server #526] Server created (t=+1.244216ms) balancer_test.go:772: Started TestService backend at: "127.0.0.1:38063" tlogger.go:116: INFO config.go:144 [rls] Received JSON service config: { "routeLookupConfig": { "grpc_keybuilders": [ { "names": [ { "service": "grpc.testing.TestService" } ], "headers": [ { "key": "k1", "names": [ "n1" ] }, { "key": "k2", "names": [ "n2" ] } ] } ], "lookup_service": "127.0.0.1:35475", "lookup_service_timeout": "5s", "max_age": "0.100s", "cache_size_bytes": "1024" }, "routeLookupChannelServiceConfig": { "loadBalancingConfig": [ { "pick_first": {} } ] }, "childPolicy": [ { "test-child-policyTest/ControlChannelConnectivityStateMonitoring": {} } ], "childPolicyConfigTargetFieldName": "Backend" } (t=+2.370632ms) tlogger.go:116: INFO server.go:786 [core] [Server #525 ListenSocket #527] ListenSocket created (t=+3.815951ms) tlogger.go:116: INFO server.go:786 [core] [Server #526 ListenSocket #528] ListenSocket created (t=+7.234996ms) tlogger.go:116: INFO clientconn.go:105 [core] [Channel #529] Channel created (t=+8.666116ms) tlogger.go:116: INFO clientconn.go:1579 [core] [Channel #529] original dial target is: "rls-e2e:///" (t=+8.798917ms) tlogger.go:116: INFO clientconn.go:1586 [core] [Channel #529] parsed dial target is: {Scheme:rls-e2e Authority: Endpoint: URL:{Scheme:rls-e2e Opaque: User: Host: Path:/ RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}} (t=+8.961119ms) tlogger.go:116: INFO clientconn.go:263 [core] [Channel #529] Channel authority set to "" (t=+9.104421ms) tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #529] Resolver state updated: { "Addresses": null, "ServiceConfig": { "Config": { "Config": null, "LB": null, "Methods": {} }, "Err": null }, "Attributes": null } (service config updated) (t=+9.363025ms) tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #529] Channel switches to new LB policy "rls_experimental" (t=+9.625628ms) tlogger.go:116: INFO balancer.go:260 [rls] [rls-experimental-lb 0x4000336600] Creating control channel to RLS server at: 127.0.0.1:35475 (t=+10.007133ms) tlogger.go:116: INFO control_channel.go:127 [rls] [rls-control-channel 0x4000238000] Disabling service config from the name resolver and instead using: {"loadBalancingConfig": [{"pick_first": {}}]} (t=+15.280104ms) tlogger.go:116: INFO clientconn.go:105 [core] [Channel #530] Channel created (t=+15.543207ms) tlogger.go:116: INFO clientconn.go:1579 [core] [Channel #530] original dial target is: "127.0.0.1:35475" (t=+16.076014ms) tlogger.go:116: INFO clientconn.go:1584 [core] [Channel #530] dial target "127.0.0.1:35475" parse failed: parse "127.0.0.1:35475": first path segment in URL cannot contain colon (t=+16.279217ms) tlogger.go:116: INFO clientconn.go:1599 [core] [Channel #530] fallback to scheme "passthrough" (t=+16.411719ms) tlogger.go:116: INFO clientconn.go:1607 [core] [Channel #530] parsed dial target is: {Scheme:passthrough Authority: Endpoint:127.0.0.1:35475 URL:{Scheme:passthrough Opaque: User: Host: Path:/127.0.0.1:35475 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}} (t=+16.616622ms) tlogger.go:116: INFO clientconn.go:263 [core] [Channel #530] Channel authority set to "127.0.0.1:35475" (t=+16.813024ms) tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #530] Resolver state updated: { "Addresses": [ { "Addr": "127.0.0.1:35475", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Type": 0, "Metadata": null } ], "ServiceConfig": null, "Attributes": null } (resolver returned new addresses) (t=+17.124028ms) tlogger.go:116: INFO clientconn.go:631 [core] [Channel #530] ignoring service config from resolver (https://github.com/grpc/grpc-go/runs/7033783730?check_suite_focus=true