spring-cloud / spring-cloud-consul

Spring Cloud Consul
http://cloud.spring.io/spring-cloud-consul/
Apache License 2.0
813 stars 541 forks source link

Provide a way to listen to specific services instead of the whole catalog #566

Open fuleow opened 5 years ago

fuleow commented 5 years ago

spring-cloud-consul provides a Consul Catalog Watch that publishes heartbeat events on catalog changes. In an environment with many services the catalog can change rapidly (multiple times per second) causing heartbeat events to trigger for services which the application is not interested in.

For example, this mechanism is used in Spring Cloud Config Client if discovery is enabled link

In practice the config client is only interested in updates to the spring-cloud-config-server but this triggers each time the catalog updates.

The catalog services watch can already be disabled. It would be very useful if an alternative heartbeat event producer can be implemented which takes a list of relevant services and only publishes heartbeat events when those specific services are updated.

Currently any micro service in our organization will start generating many requests to consul just by adding spring-cloud-starter-consul-discovery. While the watch-delay is configurable it's less than ideal if your application is only interested in a subset of services. If the watch-delay is too high you risk not getting an immediate update when the services changes and if it's too low you get flooded with events.

spencergibb commented 5 years ago

The consul api we use does not provide filtering. https://www.consul.io/api/catalog.html#list-services

fuleow commented 5 years ago

The consul api we use does not provide filtering. https://www.consul.io/api/catalog.html#list-services

Yes it doesn't provide filtering on the entire catalog. However, long polling watches can be created for each service that the application is interested in. The mechanism can be very similar to how it's done now with a background thread for each service.

https://www.consul.io/api/catalog.html#list-nodes-for-service

spencergibb commented 5 years ago

Having to explicitly list every service you want to watch doesn't seem scalable to me.

fuleow commented 5 years ago

I think that applications which have many upstream services can continue using the existing catalog watch mechanism. However it would be nice if microservices which only have a couple of interfaces be able to watch those they are interested in instead of the entire catalog.

Here is some data on our own consul index and how quickly it changes

while true; do curl -s -v http://consul:8500/v1/catalog/services 2>&1> /dev/null | grep Index; sleep 5; done
< X-Consul-Index: xxxx59768
< X-Consul-Index: xxxx59896
< X-Consul-Index: xxxx59961
< X-Consul-Index: xxxx60067
< X-Consul-Index: xxxx60243
< X-Consul-Index: xxxx60415
< X-Consul-Index: xxxx60548
< X-Consul-Index: xxxx60743
< X-Consul-Index: xxxx60774
spencergibb commented 5 years ago

I think that is a rarity. As you mentioned, you can disable it and roll your own. We'll wait to see if there are more folks who want this.

alex-dubrouski commented 5 years ago

Good afternoon, Not sure what do you mean by rarity. It is very common to have hundreds of microservices and use continuous deployment strategies in huge organizations. May be you are operating at a different scale, but we have multiple datacenters and Consul catalog contains thousands of tags. To be precise we started investigating high CPU/Memory/Network usage of some idle microservices running in the cloud. I profiled couple of them and found that they are all affected by the same problem. Spring schedules a task which continuously polls Consul and fetches updates (due to high rate of changes it happens every couple of seconds) it results in heavy memory consumption by underlying "com.ecwid.consul" library. This results in frequent garbage collections and high CPU usage. jprofiler Here is JProfiler hot spot allocation report. In reality absolutely idle service continuously uses around 400MB of memory. I am sorry but I think this implementation does not scale. We are currently suspended usage of it and looking for other ways to implement service discovery.

spencergibb commented 5 years ago

@alex-dubrouski please open a separate issue as that does not seem related to ConsulCatalogWatch. We'd be happy to entertain ways to make things more efficient as no one has reported anything similar.

What doesn't seem to be scalable is to have to manually add a service that needs to be watched for the OPs use case.

alex-dubrouski commented 5 years ago

Spencer, Please review the attached screenshot again. ConsulCatalogWatch schedules task which fetches catalog here: https://github.com/spring-cloud/spring-cloud-consul/blob/master/spring-cloud-consul-discovery/src/main/java/org/springframework/cloud/consul/discovery/ConsulCatalogWatch.java#L129 When rate of changes of Consul catalog is high and catalog itself is huge it results in this high memory allocation / CPU usage. Memory allocated for CatalogConsulClient.getCatalogServices()on attached picture is 289MB

spencergibb commented 5 years ago

Yes, but the image shows the trace coming thru the /health actuator endpoint. While they both call the same method, I think it's still a different situation.

Might be worth setting management.health.consul.enabled=true.

alex-dubrouski commented 5 years ago

Only small percentage of calls is coming from health services, most of them is a result of discovery.

spencergibb commented 5 years ago

if that's the case, you can use tags to limit what is returned.

alex-dubrouski commented 5 years ago

Yes, this is what we actually asked to implement. Currently ConsulCatalogWatch uses /catalog/services endpoint which does not provide filtering and does not accept tags (you said it yourself here https://github.com/spring-cloud/spring-cloud-consul/issues/566#issuecomment-501821619). Fu asked to implement support for watching specific services as second possible path.

spencergibb commented 5 years ago

Do you both work together? For ConsulCatalogWatch there hasn't been a request for this except for you. It may be worth implementing something on your own, seeing how it works and submitting a PR. #475 may be interesting to you as well.

varnson commented 5 years ago

I had added waitTime and index in getServers() of ConsulServerList.java in my company. This changed getHealthServices() to be a blocking query. I also want to add cached parameter in it. I think the catalogwatch was useless, and you should use getHealthServices() to watch the server list.@fuleow

alex-dubrouski commented 5 years ago

I am sorry for delay, yes we work together on one team. We will try to incorporate this research into our plans. BTW Fu already has an open PR for this repo.

fuleow commented 5 years ago

Do you both work together? For ConsulCatalogWatch there hasn't been a request for this except for you. It may be worth implementing something on your own, seeing how it works and submitting a PR. #475 may be interesting to you as well.

Thanks @spencergibb I'll try to explore implementing something custom when I have some time. For the PR @alex-dubrouski mentioned I have that running on our Spring Boot Admin server and with catalog-services-watch-delay set to 10000 it's been working ok so far.

rutuls commented 3 years ago

I need to implement the scenario in service1 where I want to know health of service2 and if it is up/running then will take action accordingly. I want to know the health of a service on a separate thread or basically in a non-blocking way. How do I implement it using catalog watch ? Is there any other way to implement as well ? Are there any examples available ?