segmentio / emissary

Multi-backend Envoy xDS service.
MIT License
8 stars 7 forks source link

Reduce calls to consul leader #13

Open achille-roussel opened 5 years ago

achille-roussel commented 5 years ago

I took a look at what was hitting the consul servers on stage and emissary seems like it could be optimized, here are couple of tracks to explore:

We are querying /status/leader on every health check, Consul doesn't seem to support caching on this endpoint so it may have to hit the leader to fetch the information (see https://www.consul.io/api/status.html#get-raft-leader). We should very whether this endpoint requires querying the consul servers, and if it does we may want to replace it with a call to /agent/self for example (my understanding is we are trying to test consul connectivity in the health check).

We are not enabled cached results when querying consul, nor do we allow stale results. This means that every request that emissary sends to consul is forwarded to the leader. Here is an example of requests that emissary was making:

GET /v1/health/service/api-cogs HTTP/1.1

We need to add stale and cached to the query parameters to allow cached results to be returned.

We may want to consider returning only services that are passing consul health checks as well, and also add the passing query parameter.

rjenkins commented 5 years ago

Oh @achille-roussel this is for the health check? How often is the nlb checking health?

achille-roussel commented 5 years ago

Every couple of seconds it seems, I'm mostly concerned that this will scale linearly with the number of emissary tasks running.

rjenkins commented 5 years ago

We can improve it but I don't imagine too many emissary tasks running. We don't necessarily need to health check this or we could cache the result and only check every few mins. It's more a case of me being paranoid and figured I should check consul health seeing as emissary relies on it.

achille-roussel commented 5 years ago

Yep, checking connectivity to consul seems fine, I just think we should use an API call that the consul agent can serve instead of one that has to reach the leader.