Open kaisecheng opened 3 months ago
Pinging code owners:
pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley
See Adding Labels via Comments if you do not have permissions to add labels yourself.
@kaisecheng Thanks for opening this and being willing to take on the implementation. I'm not familiar with MurmurHash, could you offer more details about what use cases you have in mind? I see that a handful of applications use it for non-cryptographic hashing, but I'd like to have a clear use-case in mind for including it in OTTL. We currently support 64-bit FNV-1a hashes through the FNV
function, so I think we at least already support a non-cryptographic hash.
@evan-bradley Thanks for looking into this issue.
While FNV
is useful, MurmurHash3
offers distinct advantages that make it essential for many use cases. It's generally faster than FNV, especially for longer inputs, and provides a more uniform distribution. MurmurHash3 is widely used for data deduplication, consistent hashing, and as a hash function in data pipelines.
I believe supporting MurmurHash3 in OTTL would align with industry practices and significantly ease migration for users with existing systems that rely on it.
@kaisecheng can you provide some examples of other industry tools or data pipelines that utilize this has function?
@TylerHelmuth Major data processing frameworks like Spark, Flink, and Apache Beam use MurmurHash3 for fingerprinting in deduplication operations, stream processing, and consistent hashing. MurmurHash3 in OTTL could be used to generate fingerprints of telemetry data for routing, deduplication, and determining which shard a piece of data belongs to. This is particularly useful for customers who manage database sharding themselves, and would facilitate seamless integration with existing data pipelines that rely on MurmurHash3
@TylerHelmuth ☝️ Any thoughts on adding MurmurHash3? Adding a faster hash function with a more uniform distribution is a good enhancement
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
OTTL doesn't have murmur3 hash function, which is widely use for non-cryptographic purposes with low collision rate
Describe the solution you'd like
Use spaolacci/murmur3 Sum128 to hash input string and return hex string fingerprint
Describe alternatives you've considered
No response
Additional context
No response