honeycombio / refinery

Refinery is a trace-aware tail-based sampling proxy. It examines whole traces and intelligently applies sampling decisions (whether to keep or discard) to each trace.
285 stars 88 forks source link

Derived Column support in Refinery #1154

Open tdarwin opened 3 months ago

tdarwin commented 3 months ago

Is your feature request related to a problem? Please describe. Some customers have created some great derived columns within honeycomb to pull useful information out of their data, but sometimes they'd like to be able to use those columns within their Sampling rules and field lists. It would be super cool if we could put Derived Columns into the config of Refinery to be able to reference them, or maybe use an API key to pull the derived column logic from honeycomb and use it in Refinery.

Describe the solution you'd like

Describe alternatives you've considered Obviously, doing these transformations in the collector or doing them in the instrumentation is doable, and would not require any work in Refinery.

Additional context

kentquirk commented 3 months ago

We have thought about giving refinery a rules configuration model that allows for greater freedom.

It wouldn't be THAT big a deal to support expression-based conditions with AND and OR, plus math -- could be done with an alternate field in the existing YAML structure for conditions, maybe something like Expression: "duration_ms > 1000 OR span_count > 5000"

The problem with that is that it isn't the DC format. We could implement the DC language, but I've resisted that because it's very user-unfriendly without infix operators -- IMO, it's harder to read than the existing YAML.

That said, it's a really interesting idea to think about being able to read DCs from the API and make them available in Refinery. Do you think it would be enough to allow someone to reference a DC by name, without also giving them the ability to define one in Refinery? (Not having to deal with parsing errors would save implementation time.)

We'd probably check them at configuration load time and cache them locally, and most likely only invalidate them once an hour or so.