vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.16k stars 1.6k forks source link

Reroute discarded events from throttle transform #13549

Open fpytloun opened 2 years ago

fpytloun commented 2 years ago

A note for the community

Use Cases

I would like to create metric with extra labels from throttled events fields (eg. tag or service) to be able to see what was throttled.

There might be more use-cases, eg. send throttled events to some other backend for further processing.

Attempted Solutions

Using vector_events_discarded_total metric cannot give any detailed info:

image

There's a key label as documentation states that seemed to be used for this:

Any event passed when the rate limiter is at capacity will be discarded and tracked by an events_discarded_total metric tagged by the bucket’s key.

But it has static value (I think it should be extracted, possibly a bug?):

key="_throttle_key"

Related config:

    [transforms.remap_fluent_throttle_key]
      type = "remap"
      inputs = ["in_fluent"]
      source = '''
      if match(.tag, r'^alert\..*$') ?? false {
        ._throttle_key, err = join([.labels.cluster_name, .labels.alertname], separator: "_")
        if err != null {
          log("Unable to construct throttle key for alert cluster_name=" + to_string!(.labels.cluster_name) + ", alertname="+ to_string!(.labels.alertname) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
          abort
        }
      } else {
        ._throttle_key, err = join([.cluster_name, .tag], separator: "_")
        if err != null {
          log("Unable to construct throttle key for cluster_name=" + to_string!(.cluster_name) + ", tag="+ to_string!(.tag) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
          abort
        }
      }
      '''

    [transforms.throttle_fluent]
      type = "throttle"
      inputs = ["remap_fluent_throttle_key"]
      key_field = "_throttle_key"

      # Don't limit apiaccess, useractivity and secevent logs
      exclude.type = "vrl"
      exclude.source = '''!includes(["svcfw.obelixd.apiaccess", "svcfw.obelixd.useractivity", "svcfw.obelixd.secevent"], .tag)'''

      # 150 msg/s per cluster/tag combination
      window_secs = 60
      threshold = 9000

Proposal

Might be done by routing throttled events away and attach additional log_to_metric transform on that events. Basically same feature as dropped option for remap transform.

events that result in runtime errors or aborts will be dropped from the default output stream and sent to the dropped output instead. For a transform component named foo, this output can be accessed by specifying foo.dropped as the input to another component.

References

No response

Version

No response

spencergilbert commented 2 years ago

Responded in Discord as well - but we may be passing just the name of the key, rather than the contained value from that key (which is what should be bucketed).

jszwedko commented 2 years ago

Responded in Discord as well - but we may be passing just the name of the key, rather than the contained value from that key (which is what should be bucketed).

Yeah, that looks like the case. I'll open a separate bug for that since I think this feature request to be able to route dropped events is also a good one.

jszwedko commented 2 years ago

Actually @fpytloun, the key_field is a template configuration option in Vector. I think you want need to use the template syntax so: key_field = "{{_throttle_key}}". This is a bit confusing because of the name of the configuration option. I'm struggling to come up with a good name for it at the moment though. thottle_key is a bit better, but still not great.

fpytloun commented 2 years ago

@jszwedko checking related code: https://github.com/vectordotdev/vector/blob/master/src/transforms/throttle.rs#L153

and it indeed seems that key_field should be template to work as expected, otherwise key would be static value and basically useless (it would just rate-limit everything) 🤯 So it's definately wrongly named and wrongly documented (key or throttle_key would make at least some sense but key_field gives impression that it should be just field name).

neuronull commented 2 years ago

This PR was opened to address the documentation: https://github.com/vectordotdev/vector/pull/13561

jszwedko commented 2 years ago

đź‘Ť I think I like throttle_key as an improvement at least.

jszwedko commented 2 years ago

@fpytloun just a note that a workaround came up today that would involve using log_to_metric both in-front of, and behind, the throttle transform to generate two metrics that could be subtracted to get the metrics about the throttled events.