habitat-sh / habitat

Modern applications with built-in automation
https://www.habitat.sh
Apache License 2.0
2.61k stars 315 forks source link

Investigate cache service usage in builder SaS #7672

Open markan opened 4 years ago

markan commented 4 years ago

Our bldr service uses an external caching service to help lower latency to our customers. It is generally believed it is helpful, but we don't we don't have good insight into what it is doing for us.

A large fraction of our API responses are marked private, to prevent caching. On average about 11% of our request volume (by call count) is actually cached. However that cached content still amounts to a sizable volume of data.

The objectives of this are twofold:

  1. Understand exactly what we do cache, and understand the impact of the request volume avoided. Not caching would involve fewer moving parts and make troubleshooting easier. Could we handle the volume without caching? Improvements to reduce the need to cache might also improve scaling for on-prem builder instances.

  2. Understand what we do not cache, and the impact of that going through the caching service. Multiple layers of miss resolution seem to occur before actually going to builder. Is it worthwhile to disable caching entirely in the service? What benefits does streamlining this path bring?

Tasks likely needed to be performed include:

  1. Devise a measure to understand our external responsiveness. We have good visibility in our internal API. Getting various packages via https://bldr.habitat.sh/v1/depot/channels/core/stable/pkgs/$pkg/latest?target=x86_64-linux observes our non-cached path, but we need to make sure we cover both cached and non-cached resources. This should be done from someplace not in us-west2

  2. Experiment with filter rules to turn off caching for various portions of our API. This gives us an incremental knob to explore the impact of caching, both on eternal responsiveness, and our internal load. Experiments should include:

Aha! Link: https://chef.aha.io/features/APPDL-37

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.