tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Deduplication Operator #108

Closed mavam closed 3 months ago

mavam commented 8 months ago

Deduplication of events according to a given set of keys is a common way to reduce the volume downstream. A dedupe operator can achieve this function.

### Definition of Done
- [x] Look at the equivalent functionality at [Splunk](https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/dedup), [Cribl](https://docs.cribl.io/search/dedup/) ([blog](https://cribl.io/blog/streaming-data-deduplication-with-cribl/)), and others
- [x] Design the deduplication mechanism (e.g., compound record hash, hash table expiry, etc.)
- [x] Agree on the operator UX
- [x] Implement and test the operator
mavam commented 8 months ago

A Discord user asked for this feature.

mavam commented 6 months ago

This came up indirectly again in a strategic partner discussion.

dominiklohmann commented 3 months ago

This came up again in a call with a prospect. I've moved this back up to discussion.