open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

[pkg/ottl] Add murmur3 #34077

Open kaisecheng opened 3 months ago

kaisecheng commented 3 months ago

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

OTTL doesn't have murmur3 hash function, which is widely use for non-cryptographic purposes with low collision rate

Describe the solution you'd like

Use spaolacci/murmur3 Sum128 to hash input string and return hex string fingerprint

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 3 months ago

Pinging code owners:

evan-bradley commented 3 months ago

@kaisecheng Thanks for opening this and being willing to take on the implementation. I'm not familiar with MurmurHash, could you offer more details about what use cases you have in mind? I see that a handful of applications use it for non-cryptographic hashing, but I'd like to have a clear use-case in mind for including it in OTTL. We currently support 64-bit FNV-1a hashes through the FNV function, so I think we at least already support a non-cryptographic hash.

kaisecheng commented 3 months ago

@evan-bradley Thanks for looking into this issue. While FNV is useful, MurmurHash3 offers distinct advantages that make it essential for many use cases. It's generally faster than FNV, especially for longer inputs, and provides a more uniform distribution. MurmurHash3 is widely used for data deduplication, consistent hashing, and as a hash function in data pipelines. I believe supporting MurmurHash3 in OTTL would align with industry practices and significantly ease migration for users with existing systems that rely on it.

TylerHelmuth commented 3 months ago

@kaisecheng can you provide some examples of other industry tools or data pipelines that utilize this has function?

kaisecheng commented 3 months ago

@TylerHelmuth Major data processing frameworks like Spark, Flink, and Apache Beam use MurmurHash3 for fingerprinting in deduplication operations, stream processing, and consistent hashing. MurmurHash3 in OTTL could be used to generate fingerprints of telemetry data for routing, deduplication, and determining which shard a piece of data belongs to. This is particularly useful for customers who manage database sharding themselves, and would facilitate seamless integration with existing data pipelines that rely on MurmurHash3

kaisecheng commented 2 months ago

@TylerHelmuth ☝️ Any thoughts on adding MurmurHash3? Adding a faster hash function with a more uniform distribution is a good enhancement

github-actions[bot] commented 3 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.