Open dcarley opened 3 years ago
Thanks for the feedback, Dan. I've filed a report for this internally and will follow up with the team.
We think this would be best-served by a configuration service that's enabled by the system that can read rules from a URL. Since refinery now supports rules coming from a URL, that service can handle the updates and the next time Refinery pulls those rules, it will change according to the rules changes. Keeping open for now since we don't have an ideal place to put this, but we'd prefer that this doesn't get implemented in Refinery itself.
Something that we've struggled with since implementing samproxy/refinery is being able to surface infrequently occurring and non-erroneous traces, such as canary deployments or services that just don't receive much traffic.
We've tried experimenting with using additional fields for the sampling key, such as including the service name or a "boost" attribute that can be controlled by the service, but it's mostly resulted in an unpredictable cardinality and throughout of all spans.
Some colleagues (@emauton and @conormcd) had an idea that was inspired by Fred Hebert's Recon to provide a mechanism of forcing the capture of all interesting spans for a limited time period. For example, with a hypothetical CLI and API, you'd be able to:
There's some similarity with the LaunchDarkly proposal that was discussed in Slack.