Open mvkara opened 4 years ago
Hey, this repo was mainly intended for benchmarking different implementations. I completed the Okasaki-style implementation and moved it to FSharp.Data.Adaptive.
The current implementation has some minor bugs with netcore3 bit intrinsics, but as long as you don't compile it with USE_INTRINSICS
it should be just fine (performance gain was actually quite disappointing anyways)
You could simply use the FSharp.Data.Adaptive package or copy the file atm. We could also create a separate package only containing the HashCollections if you want to use them.
Thanks for pointing me to the package and making me aware of the library. In case your interested I tried using the package and bench marking it but FSharp.Data.Adaptive is still much slower than my implementation especially as the collection size grows. Not sure if there is any plans to move any of these collections to FSharp.Core; I would prefer it there but it should still be fast otherwise what's the point of using it over the standard F# Map.
EDIT: The full benchmarks with other collections I've tried is at: https://github.com/mvkara/fsharp-hashcollections/pull/5
| Method | CollectionSize | Mean | Error | StdDev |
|------------------------- |--------------- |----------:|---------:|---------:|
| GetHashMap | 10 | 11.07 ns | 0.023 ns | 0.022 ns |
| GetFSharpDataAdaptiveMap | 10 | 21.05 ns | 0.024 ns | 0.021 ns |
| GetHashMap | 100 | 13.17 ns | 0.011 ns | 0.010 ns |
| GetFSharpDataAdaptiveMap | 100 | 43.17 ns | 0.091 ns | 0.085 ns |
| GetHashMap | 1000 | 11.16 ns | 0.013 ns | 0.013 ns |
| GetFSharpDataAdaptiveMap | 1000 | 65.02 ns | 0.043 ns | 0.038 ns |
| GetHashMap | 100000 | 23.24 ns | 0.022 ns | 0.021 ns |
| GetFSharpDataAdaptiveMap | 100000 | 140.15 ns | 2.027 ns | 1.797 ns |
| GetHashMap | 500000 | 69.15 ns | 1.056 ns | 0.936 ns |
| GetFSharpDataAdaptiveMap | 500000 | 302.25 ns | 1.888 ns | 1.766 ns |
| GetHashMap | 750000 | 98.61 ns | 0.074 ns | 0.062 ns |
| GetFSharpDataAdaptiveMap | 750000 | 352.38 ns | 0.350 ns | 0.311 ns |
| GetHashMap | 1000000 | 109.27 ns | 0.044 ns | 0.041 ns |
| GetFSharpDataAdaptiveMap | 1000000 | 399.60 ns | 7.830 ns | 6.941 ns |
| GetHashMap | 5000000 | 133.62 ns | 1.209 ns | 1.131 ns |
| GetFSharpDataAdaptiveMap | 5000000 | 610.57 ns | 3.078 ns | 2.880 ns |
| GetHashMap | 10000000 | 151.67 ns | 0.146 ns | 0.129 ns |
| GetFSharpDataAdaptiveMap | 10000000 | 705.60 ns | 1.254 ns | 1.173 ns |
Hey, cool
The only thing alienating me here is that is in my benchmarks the fsharpx implementation was significantly slower than ours.
The test seems to be quite identical, except that I explicity used containsKey
instead of tryFind
but that shouldn't play much of a role. The other difference here is that i use random entries for the table...
Maybe I'll just add your implementation to our benchmarks too to understand this better.
Nonetheless this looks promising and I'd totally love my implementation to be replaced with something faster 👍
One comcern here is that the okasaki variant also supports efficient binary operations (symmetricdiff, etc) which may be tricky for other implementations...
That's why I created this issue in the first place; I'm skeptical of benchmarks in general including my own and wanted to see if it was possible to incorporate yours into the mix. If I put something out there for others to use I want to make sure I haven't missed something.
The use case I was aiming for in my current project was a drop in replacement for F#'s Map which most people typically use as a dictionary - i.e. tryFind, add and remove rather than for the operations you describe. I focused more on the tryFind case as it is the case I needed to optimise for my own reasons and is the one most suspectible to small changes (e.g. equality dispatch, struct vs classes, intrinsic use, etc). The add and remove case where copying of data swamps most of these concerns at least in my benchmarking wasn't as critical for me so I didn't optimise those operations heavily.
If symmetricdiff and such are important maybe this algorithm in general isn't the best fit? Would need to dig into this more. In any case there is room for other data structures that optimise these operations over others I would think.
Also be aware there has been ongoing work in the fsharp repo about optimising maps and sets with various benchmarking going on, e.g. https://github.com/dotnet/fsharp/pull/5365
Just found this repo from the mention on https://github.com/fsprojects/FSharp.Data.Adaptive/issues/17. I recently had some performance issues with F#'s built in map and so wrote my own implementation (https://github.com/mvkara/fsharp-hashcollections) to use on a project I've been working on. So far performance looks good compared to the collection types I've found although admittedly I've only tested lookup's. Didn't realise there was other implementations out there. I've put my benchmarks on the README.md file in the repo.
I was thinking how easy would it be, or do you have a net core 3 version of this? Wanted to try out my library on your benchmarks to compare to other libraries out there to see if its worth finishing my library and my impl requires netcoreapp3.1. Don't want to fracture the ecosystem with many different versions of the same collection if its already been implemented.