open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

Align usage of hashing functions #6136

Open jpkrohling opened 2 years ago

jpkrohling commented 2 years ago

Some parts of the code base are using custom-made hashing functions, others are using hash/fnv, while others are using other things. I believe we should unify that as much as possible, hopefully with hash/fnv as it's very performant and good enough for non-cryptographic usage.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/6e2b67b56fe8e9f9af4b9cc16c1fa0e38fcc4fad/processor/probabilisticsamplerprocessor/probabilisticsampler.go#L156

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/bff12af90c6d3d1192076a8243aff0620d6d651c/processor/tailsamplingprocessor/internal/sampling/probabilistic.go#L18

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a811dbbb4e85e84b67ba905def2dba551ea6ed83/exporter/loadbalancingexporter/consistent_hashing.go#L18

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a811dbbb4e85e84b67ba905def2dba551ea6ed83/processor/groupbytraceprocessor/event.go#L21

VineethReddy02 commented 2 years ago

Looking into this! :)

jpkrohling commented 2 years ago

There might be cases where it's OK to deviate from the standard, like for #5981. As @ydrozhdzhal mentioned:

I checked the fnv and it doesn't feet my needs, because it works with byte arrays and hashstructure can hash entire structure(you can find this in MR),

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

jpkrohling commented 1 year ago

@VineethReddy02, are you still interested in this?

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

jpkrohling commented 1 year ago

This is again up for grabs.

oertl commented 1 year ago

The project SMHasher compares various non-cryptographic hash functions in terms of speed and hash quality. The fastest hash functions without quality problems are listed here. FNV is not one of them.

jmichalek132 commented 1 year ago

Will try to find time to submit PR for this over the weekend.

jpkrohling commented 1 year ago

@oertl, right now, we have some usage of fnv and I'd align on that first, given there should be only a few deviating usages. I am open to discussing changing the hashing algorithm to something else. Would you please open an issue with your arguments?

oertl commented 1 year ago

@jpkrohling I am fine with them being aligned first. I just wanted to point out that there are better hash functions available in the meantime. The alignment may be the ideal moment to switch to a better hash function.

By the way, Go uses a variant of wyhash as its default hash function (see https://go.dev/src/runtime/hash64.go), which SMHasher lists among the recommended hash functions.

jpkrohling commented 1 year ago

@oertl, I'd happily review a PR aligning the code base with a new hash function, especially if it's well-documented that it's a better fit for our use-cases (which seems to be the case!).

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

fatsheep9146 commented 1 year ago

@jmichalek132 are you still working on this?

jmichalek132 commented 1 year ago

Actually didn't get to it at all, unfortunately.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.