jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.19k stars 2.4k forks source link

Add generic caching layer for services and operations #1743

Open sergiimk opened 5 years ago

sergiimk commented 5 years ago

Requirement - what kind of business use case are you trying to solve?

We are implementing a custom gRPC-based storage plugin as per this doc.

Problem - what in Jaeger blocks you from solving the requirement?

The gRPC storage plugin is currently called upon every single UI interaction. For example refreshing the main page will call GetServices and GetOperations. In majority of cases these operations will involve costly external calls and performing them for every user of Jaeger UI will quickly become a massive bottleneck. This means that implementing a usable plugin currently requires adding a lot of complex caching logic directly into the plugin.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Jaeger already includes several implementations of caching, but they are specific to different storage backends. It would be great if generic caching logic existed in between Jaeger and a storage plugin, so that when implementing a plugin you didn't have to worry about caching the results and could focus on data access.

Any open questions to address

yurishkuro commented 5 years ago

Agreed. In our internal build we're wrapping the storage in an autoRefreshCache that returns service and operation names from cache and periodically refreshes it from real storage in the background. It's been working pretty well. I can commit that code to github if someone is willing to work on adding a CLI flag for it (the refresh interval) and wiring the cache in the query-service main().

burmanm commented 5 years ago

Badger storage loads GetOperations & GetServices only during startup and then updates it rest of the time in the memory (with TTL purging happen during reads).

Generic caching approach would probably return stale replies in this case and perform a bit worse. I would assume something like Kafka / Badger could use subscription based caching (DB writes are automatically refreshing the cache's data), but that's obviously not possible for all the backend types. So using different caching mechanisms probably has its place.

yurishkuro commented 5 years ago

The caching would be optional. And it only makes sense in installations that have a lot of services and operations. In our case we have thousands of services and 10s of thousands of operations. However, that data is pretty static.

An up to date remote cache shared between collectors and query services feels like an overkill.

matthiaslee commented 3 years ago

@yurishkuro I know this response it about a year+ late, but I'd be happy to take a stab at getting this cache pushed over the finish line. Would you mind sharing what you have so far?

yurishkuro commented 3 years ago

At Uber there was a decorator for storage.Reader that cached services/operations response and returned cached versions to the UI, while having a timer loop in the background refreshing the cache every 1min or so.