line / armeria

Your go-to microservice framework for any situation, from the creator of Netty et al. You can build any type of microservice leveraging your favorite technologies, including gRPC, Thrift, Kotlin, Retrofit, Reactive Streams, Spring Boot and Dropwizard.
https://armeria.dev
Apache License 2.0
4.8k stars 912 forks source link

Improve xDS health check #5785

Closed jrhee17 closed 1 month ago

jrhee17 commented 3 months ago

Motivation:

It is recommended to review https://github.com/line/armeria/pull/5802 prior to this PR

This changeset attempts to solve several problems:

Custom filter logic

xDS considers all endpoints when computing whether a PrioritySet is in panic state. For instance, if the percentage of unhealthy endpoints exceeds a preconfigured panic threshold, the endpoint selection includes all endpoints regardless of the degraded status. While armeria supports an HealthCheckedEndpointGroup out of the box, it filters out healthy endpoints automatically.

In order to resolve this, I propose that a AbstractHealthCheckedEndpointGroupBuilder#healthCheckedEndpointPredicate API is added

Per-cluster health check configuration

Per-cluster member health check is difficult with the current API since a single parameter set is statically defined for an entire health checked endpoint group.

We already have an abstraction AbstractHealthCheckedEndpointGroupBuilder#newCheckerFactory. I propose that this API be used for the purpose of xDS. In order to support the parameters xDS allows configuring, I propose that parameters are passed to HttpHealthChecker via the constructor instead of the HealthCheckerContext. In order to support this change, HttpHealthChecker has been moved to an internal package so the xds module can also access it.

Modifications:

Result:

github-actions[bot] commented 3 months ago

🔍 Build Scan® (commit: 0ae3789f34d1ba9a3a1d2b2088a003ea7b39eff2)

Job name Status Build Scan®
build-self-hosted-unsafe-jdk-8 https://ge.armeria.dev/s/vtgtvqk2ryxde
build-self-hosted-unsafe-jdk-21-snapshot-blockhound https://ge.armeria.dev/s/lsxzwaxry5gek
build-self-hosted-unsafe-jdk-17-min-java-17-coverage https://ge.armeria.dev/s/6y6j2wenyqrum
build-self-hosted-unsafe-jdk-17-min-java-11 https://ge.armeria.dev/s/eecrpuk5naj4c
build-self-hosted-unsafe-jdk-17-leak https://ge.armeria.dev/s/hye3e2lgci4qq
build-self-hosted-unsafe-jdk-11 https://ge.armeria.dev/s/dns2eeungr3xy
build-macos-12-jdk-21 https://ge.armeria.dev/s/qlrls3iw3xhyu
jrhee17 commented 1 month ago

This PR is now back to reliably passing the CI