line / armeria

Your go-to microservice framework for any situation, from the creator of Netty et al. You can build any type of microservice leveraging your favorite technologies, including gRPC, Thrift, Kotlin, Retrofit, Reactive Streams, Spring Boot and Dropwizard.
https://armeria.dev
Apache License 2.0
4.83k stars 922 forks source link

`WeightRampingUpStrategy` uses locks to avoid occasional selection failure for `XdsEndpointGroup` #5799

Closed jrhee17 closed 4 months ago

jrhee17 commented 4 months ago

Motivation:

https://github.com/line/armeria/issues/5749

While re-investigating this issue, I found that the probably cause is the changes introduced by https://github.com/line/armeria/pull/5693

Previously, when updateEndpoints is called the endpointSelector was initialized from the same thread

https://github.com/line/armeria/blob/da7f76a7eebf9b03a51ff68a9245269eee4a5a6d/core/src/main/java/com/linecorp/armeria/client/endpoint/WeightRampingUpStrategy.java#L147-L150

However after the PR above, updateEndpoints doesn't update the endpointSelector immediately and instead schedules an update task

https://github.com/line/armeria/blob/191fb7db981d1d47de8ce50be0b0f6d5d9dfa8ee/core/src/main/java/com/linecorp/armeria/client/endpoint/WeightRampingUpStrategy.java#L150

For this reason, even if updateEndpoints is called the endpointSelector will remain as the EMPTY_SELECTOR.

https://github.com/line/armeria/blob/191fb7db981d1d47de8ce50be0b0f6d5d9dfa8ee/core/src/main/java/com/linecorp/armeria/client/endpoint/WeightRampingUpStrategy.java#L128

Normally, this isn't an issue since refreshEndpoints will be called after the call to updateEndpoints which will in turn complete the select call. However, XdsEndpointGroup is composed of multiple internal EndpointGroups and only selectNow is called. Because the initialization event is never propagated to the XdsEndpointSelector occasional failures can occur.

https://github.com/line/armeria/blob/191fb7db981d1d47de8ce50be0b0f6d5d9dfa8ee/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/DefaultLoadBalancer.java#L61-L84

Modifications:

Result:

github-actions[bot] commented 4 months ago

🔍 Build Scan® (commit: d8c97f1ca91777f81b2c184a39408c56b60c4642)

Job name Status Build Scan®
build-windows-latest-jdk-21 https://ge.armeria.dev/s/adryklnhqbiye
build-self-hosted-unsafe-jdk-8 https://ge.armeria.dev/s/knm34vforyqgy
build-self-hosted-unsafe-jdk-21-snapshot-blockhound https://ge.armeria.dev/s/xbc3hgevnbgwy
build-self-hosted-unsafe-jdk-17-min-java-17-coverage https://ge.armeria.dev/s/2ukke44ngsgla
build-self-hosted-unsafe-jdk-17-min-java-11 https://ge.armeria.dev/s/d5y4zdclnk6ui
build-self-hosted-unsafe-jdk-17-leak https://ge.armeria.dev/s/serbxppuls5q2
build-self-hosted-unsafe-jdk-11 https://ge.armeria.dev/s/bdj7q2yajubxw
build-macos-12-jdk-21 https://ge.armeria.dev/s/fgijwtf22xlwg