vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.59k stars 1.55k forks source link

Can i use reduce transform(with merge strategies as sum) to sum events at a periodic time of 15/20seconds. #16695

Open saurabhkohli-ril opened 1 year ago

saurabhkohli-ril commented 1 year ago

A note for the community

Problem

I am using reduce transform to sum some events grouped by an id. I want to do this at 15 seconds periodically. But the logs are continuous and reduce blocks till the events are over which does not solve the purpose.

Metrics are required after every 15sec. HAve tried both the pipelines below but its gets stucked with Reduce.

Pipeline :: Reduce -> LogToMetric & Reduce -> LogToMetric ->Aggregate

Here is the peice of code::

inputs = [ "parse_upf1"] type = "reduce" merge_strategies.sessions = "sum" group_by = ["id"]

[transforms.logs2metrics-id] type = "log_to_metric" inputs = [ "remap_upf" ]

[[transforms.logs2metrics-id.metrics]] type = "gauge" field = "pools" name = "PoolsPerId" namespace = "upf"

[transforms.logs2metrics-id.metrics.tags]
id="{{id}}"

[[transforms.logs2metrics-id.metrics]] type = "gauge" field = "sessions" name = "SessionsPerId" namespace = "upf"

[transforms.logs2metrics-id.metrics.tags]
id="{{id}}"

[transforms.parse_upf11] type = "aggregate" inputs = ["logs2metrics-id"] interval_ms = 15000

Pl help

Configuration

[transforms.remap_upf]
inputs = [ "parse_upf1"]
type = "reduce"
merge_strategies.sessions = "sum"
group_by = ["id"]

[transforms.logs2metrics-id]
type = "log_to_metric"
inputs = [ "remap_upf" ]

  [[transforms.logs2metrics-id.metrics]]
  type = "gauge"
  field = "pools"
  name = "PoolsPerId"
  namespace = "upf"

    [transforms.logs2metrics-id.metrics.tags]
    id="{{id}}"

  [[transforms.logs2metrics-id.metrics]]
  type = "gauge"
  field = "sessions"
  name = "SessionsPerId"
  namespace = "upf"

    [transforms.logs2metrics-id.metrics.tags]
    id="{{id}}"

[transforms.parse_upf11]
type = "aggregate"
inputs = ["logs2metrics-id"]
interval_ms = 15000

Version

vector 0.27.0 (x86_64-unknown-linux-gnu 5623d1e 2023-01-18)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

fuchsnj commented 1 year ago

I believe you are hitting the bug that was fixed by https://github.com/vectordotdev/vector/pull/16146 This fix was just recently merged in and should go out with vector 0.29. Feel free to try it out now with a nightly version of Vector.

I'm going to close this issue since I'm fairly confident this is a duplicate. However, feel free to re-open if that's not the case.

jszwedko commented 1 year ago

Reopening since we had to revert the change that closed this in https://github.com/vectordotdev/vector/pull/17084

spacepatcher commented 1 year ago

Hi there! Could you share your plans to implement this improvement? We could consider of implementing https://github.com/vectordotdev/vector/pull/16146 in our agents if the fix is not yet planned.

Cheers!

bbeaudreault commented 6 months ago

Any update here? This seems like a commonly requested feature. Currently we can define max_events, but it would be nice to have a max_reduce_time_ms or something.

jszwedko commented 6 months ago

No update yet. We'd be happy to see someone pick this up though, if they are motivated. You could start with the implementation we had to revert, here: https://github.com/vectordotdev/vector/pull/16146

jszwedko commented 3 months ago

Closed by #20440