vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.12k stars 1.6k forks source link

Support fields/VRL for setting sample rate #19332

Open NeilJed opened 11 months ago

NeilJed commented 11 months ago

A note for the community

Use Cases

Currently with the sample filter, it only accepts an integer as the input for the sample rate. It would be really good if we could set this via a field or VRL snippet.

The use case I have is that my event messages contains a field denoting the service that the log belongs to. I'm using this as the key field for hashing. However as event volume varies a lot, a fixed sample rate doesn't fit my use case.

What I would like to do is be able to set the sample rate based on a condition, in this case the value of the service field or even a event metadata value (if the sample rate is pre-caclulated and added in a previous transform).

For example:

[transforms.my_sampler]
type = "sample"
inputs = [ "upstream_source"]
rate = '''
int(get!({"serviceA": 1, "serviceB": 1, "serviceC": 1000}, [.service_name])) ?? 3
'''

Attempted Solutions

Currently the way I have implemented this is to use a VRL transform to decide the sample rate based on the service name and add it as an even field. That then passes to a route transform with a route based on the value of that field. Those then send to multiple sample transforms that have a fixed sample rate. Those are then all collected into the sink.

Obviously this is overly complex and requires me to create a route + sampler for every sample ratio I want.

image

Proposal

It seems there already existing function to support static/field/vrl input where the result must be a boolean. Could this approach be appled to add an input type where the result must be an integer? Maybe Option<u64>?

References

No response

Version

vector 0.34.1 (x86_64-apple-darwin 86f1c22 2023-11-16 14:59:10.486846964)

AvihaiSam commented 7 months ago

in the current implementation, rate is just a uint, which makes it impossible to set a simple sample rate of 2/3 (or any other fraction with numerator != 1) which might come handy in some cases. i suggest setting the rate to an actual rate (a float between 0.0 to 1.0 which describes a frequency) with the option to set it as a fraction (1/1000 or 0.001 should both work the same)