vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.45k stars 1.52k forks source link

DDScatch transformation #12976

Open yandooo opened 2 years ago

yandooo commented 2 years ago

A note for the community

Use Cases

When transforming DDSketch from metric to log to push to Clickhouse cluster they come as an encoded set of sketch fields:

{"host":"123","kind":"incremental","name":"sent_received","sketch":{"sketch":{"AgentDDSketch":{"avg":44.23529411764705,"bins":{"k":[1562,1573,1574,1576,1590,1609],"n":[4,1,5,1,3,3]},"count":17,"max":67.0,"min":32.0,"sum":752.0}}},"tags":{"systemId":"342856647","version":"v1.0"},"timestamp":"2022-06-03T20:49:40Z"}

Is there any way to transform them to the form that is queriable with the standard SQL syntax clickhouse provides? Or transform with VRL/lua to distributions/histogram/summaries? We do by default this typo of transformation in remote prometheus write to get percentiles.

Attempted Solutions

This git issue might be related https://github.com/vectordotdev/vector/issues/9181 as it talks about exposing to VRL high-level functions to do types transformation including DDSketches. It also looks like possible to write lua function to transform ddsketch/uddsketch to quantiles including merge logic, but no lua lib exists yet. Alternatively, ddsketch agent impl can be ported to clickhouse directly.

Proposal

Perhaps, closing https://github.com/vectordotdev/vector/issues/9181 would do the trick. Meanwhile, are there any other workarounds possible today to deal with DDSketches? Any example of lua/vrl script?

References

No response

Version

No response

yandooo commented 2 years ago

I managed to sort it out by implementing a custom DDSketch processing logic inside the OLAP engine and using JSON serialized vector DDSketches directly.