robclu / leapfrog

Lock-free concurrent and single-threaded hash map implementations using Leapfrog probing. Currently the highest performance concurrent HashMap in Rust for certain use cases.
Apache License 2.0
206 stars 9 forks source link

Documentation and presentation are misleading and dangerous #2

Closed Voultapher closed 2 years ago

Voultapher commented 2 years ago

GitHub

Lock-free concurrent and single-threaded hash map implementations using Leapfrog probing. Currently the highest performance concurrent HashMap in Rust.

README

The leapfrog crate contains two HashMap implementations, HashMap, which is a single-threaded HashMap, which can be used as an alternative to rusts built in HashMap (without somewhat limited functionality but better performance),

Emphasis added by me. I assume the without is meant to be with.

In bold:

If the value type for the map supports atomic operations then this map will not lock, while if the value type does not support atomic perations then accessing the value uses an efficient spinlock implementation.

500 words before:

For a more detailed description of the limitations and behaviour of the maps, see the crate documentation:

Docs.rs

500 words before:

You should use a hasher which produces unique hashes for each key. As only hashes are stored, if two keys hash to the same hash then the value for the keys which hash to the same value my be overwritten by each other if they are inserted concurrently.

What about the single threaded 'HashMap'? If I get a key hash collision, it won't overwrite the previous value?


Ignoring the factual errors, and assumed typos. The crate presents itself as this amazing super fast HashMap and spends most of its time talking about performance. When in reality it avoids the hardest problem of HashMaps, key collision, by just ignoring it. Effectively all hash functions will produce collisions at some point, in some scenario. Testing with some data and saying it works is dangerous. For example here is the fastest http download function in Rust:

fn download(_url: &str) -> String {
    "404".into()
}

I tried it on a buch of websites and it is thousands of times faster than hyper. To correctly use this leapfrog data structure, which I wouldn't even call HashMap, a user has to guarantee that either:

A) They are sure that under no circumstances ever, a collision will happen B) Losing data to collisions is fine

I guarantee you, most existing users of the stdlib HashMap, can't and don't want to guarantee either.


Given all the above, I worry that users will use this project without realizing the implications, only to get nasty bugs later on. Rust is all about avoiding that. I suggest you make it clear that this is not a regular hash map and comes with some serious implications and can't be used as alternative to the stdlib HashMap or other concurrent HashMaps without serious consideration. Here are some ways you could help future users avoid getting nasty surprises:

robclu commented 2 years ago

Hey,

Thank you for the feedback, it's much appreciated!

While I agree with your points, the download example you provided will never download anything from the requested URL. The maps in their current for work with the limitations described (they succeeded in the external benchmarks), and the failure rates are probabilistic, with the probability depending on the specific use case and hash function. Bloom filters for example are probabilistic data structures which are widely used, and a probabilistic failure rate might be acceptable for some use cases for increased performance. So I don't think the example you provided is equivalent, however, I agree with your point that Rust is a safe language and not making the limitations clear before speaking about performance is wrong. I have changed the documentation to make the limitations clear up front.

I am working on adding another map without the limitations, so that should address these problems when it's done.

Thank you again for the feedback.

Voultapher commented 2 years ago

Thanks for updating the documentation, I agree the download example is a poor fit. I'm looking forward to upcoming versions.