honeycombio / refinery

Refinery is a trace-aware tail-based sampling proxy. It examines whole traces and intelligently applies sampling decisions (whether to keep or discard) to each trace.
291 stars 91 forks source link

Feature request: Time-limited sampling changes #203

Open dcarley opened 3 years ago

dcarley commented 3 years ago

Something that we've struggled with since implementing samproxy/refinery is being able to surface infrequently occurring and non-erroneous traces, such as canary deployments or services that just don't receive much traffic.

We've tried experimenting with using additional fields for the sampling key, such as including the service name or a "boost" attribute that can be controlled by the service, but it's mostly resulted in an unpredictable cardinality and throughout of all spans.

Some colleagues (@emauton and @conormcd) had an idea that was inspired by Fred Hebert's Recon to provide a mechanism of forcing the capture of all interesting spans for a limited time period. For example, with a hypothetical CLI and API, you'd be able to:

$ capture-traces 30 service:circle-www-api-canary1
or
$ capture-traces 15 service:circle-www-api-v1 name:circle.permissions/user-can-view-builds

There's some similarity with the LaunchDarkly proposal that was discussed in Slack.

adamopenweb commented 3 years ago

Thanks for the feedback, Dan. I've filed a report for this internally and will follow up with the team.

cartermp commented 1 year ago

We think this would be best-served by a configuration service that's enabled by the system that can read rules from a URL. Since refinery now supports rules coming from a URL, that service can handle the updates and the next time Refinery pulls those rules, it will change according to the rules changes. Keeping open for now since we don't have an ideal place to put this, but we'd prefer that this doesn't get implemented in Refinery itself.