spring-projects / spring-boot

Spring Boot
https://spring.io/projects/spring-boot
Apache License 2.0
74.81k stars 40.6k forks source link

Composite HealthIndicator that runs them in parallel #2652

Open snicoll opened 9 years ago

snicoll commented 9 years ago

Right now, invoking the health endpoint lead to a synchronous invocation of all underlying HealthIndicator instance(s) that have been configured. It would be nice to offer an option to run these in parallel instead.

philwebb commented 9 years ago

Definitely in the "nice to have" bucket. I imaging that the health endpoint isn't hit all that often so performance probably isn't a big deal.

snicoll commented 9 years ago

@philwebb I had the idea when I saw #2630

fatroom commented 8 years ago

I would also like to have support for this. My project got a bunch of health checks and some of them can run up to few seconds. So in total performance of /health endpoint is terrible with sequential approach.

fjcano commented 8 years ago

:+1: Some of my projects have 20+ health checks. For now we use our homebrew solution to do them in parallel.

Would love to see support for parallel health checks in Spring Boot out of the box. Or at least a way to inject our own CompositeHealthIndicator.

prismec commented 7 years ago

And it would be nice to include a standard detail information how long a certain health indicator took to execute.

erikgreif-acc commented 6 years ago

Real-world example here:

Some services have requirements to respond within a given time limit. This time limit may be set by an external organization, not always under the control of the devs. This kind of timeout gets interpreted as a DOWN state, wakes devs up in the middle of the night, reboots service instances, and all the while provides no context as to which component was experiencing high latency (because there was no HTTP response).

I have successfully added timeouts per-actuator, but with 20 checks that require timeouts of around 5 seconds (any less would cause alerts for minor network blips), we are asking our caller to wait up to 1:40 for a response. At my org, this is absurd, and nobody wants to permit this behavior.

rmie commented 4 years ago

I second everything @erikgreif-acc said.

Sometimes I run into a situation in which a time out occurs but it is unclear which of the components caused it. So I'd like to propose a timeout for the parallel execution after which every unresponsive component is considered/reported as down. To keep the existing behavior the default for this timeout should be infinite.

And while working on it, exporting metrics (number of total, successful and failed executions, sum of execution time for successful executions for each component) would make alerting (and inhibiting) on health related issues easier and would help with dissecting problems faster.

jim-mclean commented 3 years ago

+1 for this feature, or for some sort of async checking...

I am surprised that this has not yet been implemented. We're running Spring Boot Apps on Pivotal (Tanzu?) Cloud Foundry and the out of the box settings in that environment at to consider a HTTP Healthcheck failed after 1s. For all but the simplest of apps that we have deployed to PCF this is causing unnecessary restarts.

wilkinsona commented 3 years ago

https://github.com/spring-projects/spring-boot/issues/25459 is somewhat related to this. I've been wondering about something that runs the health indicators periodically in the background and the health endpoint would then just respond immediately with the current aggregated status.

jim-mclean commented 3 years ago

@wilkinsona I think providing framework support for running health indicators in the background and providing the latest aggregate state on HTTP calls would solve this problem pretty holistically.

Initially blocking until there is a response from all indicators seams reasonable. As does considering an indicator as being DOWN if its invocation has taken longer than some threshold and/or older than some max age.

Also, it may be best to default to the current behavior, but be able to flag HealthIndicator implementations as beaning async with their timeout.. i.e. @ BackgroundCheck(maxTime=2000, maxAge= 3000) or some other existing more suitable annotation.

antoinemeyer commented 2 years ago

I ended up creating an implementation of async health indicators in this separate repo: https://github.com/antoinemeyer/spring-boot-async-health-indicator It relies on a ScheduledThreadPoolExecutor with an annotation @AsyncHealth.

I considered having it only property-driven (management.health.async.indicator.$INDICATOR-BEAN-NAME.refresh-rate) or using a different interface (AsyncHealthIndicator) but believed an annotation would be more adequate.

I would be very happy to have your feedback on the approach taken. If you think this can be integrated directly into spring-boot-actuator, I'd gladly work on a PR.

philwebb commented 2 years ago

Thanks very much for the pull-request offer @antoinemeyer. It's really great to see that you've been able to add support in an additional library. We're currently not sure if we want to offer async support in the way that you've implemented it, or as suggested in #25459. Unfortunately, we're currently investing most of our time in Spring Boot 3.0 work so it's unlikely that we'll be able to make the decision anytime soon. I think it's best we wait until Spring Boot 3.0 is out before looking at this. That probably means that a pull-request would be a bit premature, and it would be best to consider things again when we're planning Spring Boot 3.1.

antoinemeyer commented 2 years ago

Thanks @philwebb for the feedback (once again). I am looking forward to see what solution you will end up adopting for this problem. In the meantime I'll rely on (and improve) that library and hope it can benefit others!

unnisathya92 commented 2 years ago

hi @antoinemeyer, Do you have a working example for the aysnc-health-proj? It would really be helpful

antoinemeyer commented 2 years ago

hi @antoinemeyer, Do you have a working example for the aysnc-health-proj? It would really be helpful

hi @unnisathya92 ! Sure, here is a starter with asynchronous health indicators: spring-async-health-indicator-demo.zip

kcshettar commented 2 years ago

+1 for the feature. My application has three health check components.

Component 1: <100ms response time Component 2: <400ms response time Component 3: <400ms response time

Would love to see the /actuator/health endpoint to respond within 400ms (worst case).

I am rendering the health details on react application (Home Page) which takes 100ms+400ms+400ms = 900ms right now. 900ms page load time is acceptable right now. It would be nice if the Home Page can load within 400ms.

We could implement the parallel processing logic as a temporary solution. But, I am sure this feature will come out of the box some day under spring-boot-starter-actuator.

ahoehma commented 1 year ago

Any news here? In my app I have multiple db's, so multiple db-pools and so multiple health checks.

Now I see something like this:

Health contributor org.springframework.boot.actuate.jdbc.DataSourceHealthIndicator (db/creationCacheDbDataSource) took 269391ms to respond

This is really bad :)

Is there something I can do to pimp my (db) health checks to be little bit more smart regarding resources/runtime etc?

snicoll commented 1 year ago

@ahoehma this issue represents the latest state.

I am not sure you're looking in the right direction. Running them in parallel should improve things a bit, but certainly not to the extend of what you're experiencing now. In other words, please review your indicator and backend configuration.

ahoehma commented 1 year ago

@snicoll thanks for the info. I was just thinking that parallel running (slow) health-check can improve the situation a little bit but at the end you are right. Regarding the configuration of my db-indicators is there maybe somewhere some documentation? Until yet my understanding was that springboot just wrap each running Datasource into a health-indicator. Would be maybe better to use just another datasource to the same db just to keep all configurated pooled connections ready for business or should I just increase the max number of pooled connections to be ready. Also what happens if multiple callers ask the health endpoint ... there is always an on-the-fly health-check as fas as I know and this means if aws elb + other dashboards checking health ... my connections to the db just waisted :) May my whole topic is var to much for a issue ... is there a good place for such a discussion.

Many thanks!

Andreas

snicoll commented 1 year ago

That, and we don't use the issue tracker for discussions. You may want to chat with the community on Gitter. You may also be interested in endpoint caching.

tomcruise81 commented 11 months ago

Hopefully this is helpful to someone else here or in #25459. Taking a few queues from https://github.com/antoinemeyer/spring-boot-async-health-indicator (thanks @antoinemeyer), I came up with something for Spring Boot 3 that is not annotation based, but rather continues on in a manner similar to the existing HealthIndicator approach:

AsyncHealthIndicator

import java.time.Duration;
import org.springframework.boot.actuate.health.HealthIndicator;

/**
 * This is an extension to {@link HealthIndicator}, except with the intention of providing similar
 * functionality to <a
 * href="https://github.com/antoinemeyer/spring-boot-async-health-indicator">antoinemeyer/spring-boot-async-health-indicator</a>.
 * This comes as a result of the following Spring Boot issues still being open:
 *
 * <ul>
 *   <li>https://github.com/spring-projects/spring-boot/issues/2652
 *   <li>https://github.com/spring-projects/spring-boot/issues/25459
 * </ul>
 *
 * <p>This aims to offer a means of getting asynchronously provided and potentially cached {@link
 * org.springframework.boot.actuate.health.Health Health} so as to allow the health actuator
 * endpoint to return almost immediately.
 *
 * @see: https://github.com/antoinemeyer/spring-boot-async-health-indicator
 */
public interface AsyncHealthIndicator extends HealthIndicator {
  public default Duration getHealthCheckFrequency() {
    return Duration.ofSeconds(10);
  }
}

AsyncHealthIndicatorAutoConfiguration

import java.time.Duration;
import java.time.ZonedDateTime;
import javax.annotation.Nullable;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.actuate.autoconfigure.health.HealthEndpointAutoConfiguration;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthContributorRegistry;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.boot.actuate.health.NamedContributor;
import org.springframework.boot.autoconfigure.AutoConfigureAfter;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.TaskScheduler;

/**
 * Configuration to allow replacing {@link HealthIndicator} instances in the {@link
 * HealthContributorRegistry} with {@link WrappedHealthIndicator} instances. The {@link
 * WrappedHealthIndicator} will be scheduled to refresh their internal state, and during {@link
 * HealthIndicator#health()} calls, will return their currently cached internal state immediately.
 *
 * <p>This relies on the existence of a {@link TaskScheduler}, which can be enabled by an {@link
 * org.springframework.scheduling.annotation.EnableScheduling @EnableScheduling} annotation
 * somewhere within your application.
 *
 * <p>Lombok can easily be replaced if it's not desired.
 *
 * <p>This is aims to provide similarfunctionality to <a
 * href="https://github.com/antoinemeyer/spring-boot-async-health-indicator">antoinemeyer/spring-boot-async-health-indicator</a>.
 *
 * @see: https://github.com/antoinemeyer/spring-boot-async-health-indicator
 */
@Configuration
@AutoConfigureAfter(HealthEndpointAutoConfiguration.class)
class AsyncHealthIndicatorAutoConfiguration implements InitializingBean {
  private static final Log log = LogFactory.getLog(AsyncHealthIndicatorAutoConfiguration.class);

  @Autowired private HealthContributorRegistry healthContributorRegistry;

  @Autowired private TaskScheduler taskScheduler;

  @Override
  public void afterPropertiesSet() throws Exception {
    for (NamedContributor<?> namedContributor : healthContributorRegistry) {
      final String name = namedContributor.getName();
      final Object contributor = namedContributor.getContributor();
      if (contributor instanceof AsyncHealthIndicator asyncHealthIndicator) {
        healthContributorRegistry.unregisterContributor(name);
        log.debug(
            "Wrapping " + contributor.getClass().getSimpleName() + " for async health scheduling");
        WrappedHealthIndicator wrappedHealthIndicator =
            WrappedHealthIndicator.builder().wrappedHealthIndicator(asyncHealthIndicator).build();
        healthContributorRegistry.registerContributor(name, wrappedHealthIndicator);
        taskScheduler.scheduleWithFixedDelay(
            wrappedHealthIndicator, asyncHealthIndicator.getHealthCheckFrequency());
      }
    }
  }

  @Getter
  @Setter
  @Builder
  private static class WrappedHealthIndicator implements HealthIndicator, Runnable {
    private static final String LAST_CHECKED_KEY = "lastChecked";
    private static final String LAST_DURATION_KEY = "lastDuration";

    private HealthIndicator wrappedHealthIndicator;

    @Nullable private Health lastHealth;

    @Override
    public Health health() {
      Health lastHealth = getLastHealth();
      if (lastHealth == null) {
        setLastHealth(getAndWrapHealth());
        lastHealth = getLastHealth();
      }

      return lastHealth;
    }

    private Health getAndWrapHealth() {
      ZonedDateTime startTime = ZonedDateTime.now();
      Health baseHealth = getWrappedHealthIndicator().health();
      ZonedDateTime endTime = ZonedDateTime.now();
      Duration duration = Duration.between(startTime, endTime);
      return Health.status(baseHealth.getStatus())
          .withDetails(baseHealth.getDetails())
          .withDetail(LAST_CHECKED_KEY, startTime)
          .withDetail(LAST_DURATION_KEY, duration)
          .build();
    }

    @Override
    public void run() {
      setLastHealth(getAndWrapHealth());
    }
  }
}

Usage is simply replacing existing implements HealthIndicator instances with implements AsyncHealthIndicator

antoinemeyer commented 7 months ago

Hopefully this is helpful to someone else here or in #25459

@tomcruise81 FYI I updated spring-boot-async-health-indicator under version boot3-v1.3 for spring boot 3 support.