Closed Boarders closed 4 years ago
None of the above - I haven't benchmarked it in isolation. I use nubOrd in lots of larger programs, and have profiled them, and never seen nubOrd as a hot spot. If you can find an optimisation I'd be happy to apply it.
My quick read suggests making the Color field strict and UNPACK'd is probably going to be a win. Making the Map strict will probably be a big loss in certain cases where currently it wouldn't actually insert just queue up a thunk to insert if anything collides. Tail recursion is an odd beast in Haskell, and the function is currently lazily productive, which is usually also the fastest. But benchmarks would persuade me I'm wrong!
Very interesting, thanks for the answer! Later this week I will try some experiments to see if it can be made any faster. I don't have it as a bottleneck in any code though, I am trying to add the same function to vector
.
Since vector isn't an inductive data structure you might gain benefit from tail recursion. At the very least, you won't be able to return the vector lazily.
Taking a closer look, rather than unpacking the color, its easy to specialise the tree to encode the color into the node alternatives. Not sure what the performance impact will be, but given how it was used, it must be an improvement.
In my vector experiment I got a good speedup from taking the relevant parts from Data.Set
. Would that be of interest or is the relative simplicity preferred?
I tried this out and just copying the relevant functions from Data.Set
leads to worse performance on my micro benchmarks. I'll test if there are any other easy gains to be had.
There's a simplicity/speed trade off - for 2x performance, I'd probably accept 3x code, and perhaps a couple of nasty GHC-specific primops. For 1.1x performance I'd accept a few more lines and a pragma or two. But curious to see what the trade-off space looks like if you have it. I know Vector is often using unpacked data types, which might be a bit difference to nubOrd - look forward to seeing what you come up with.
In the interest of reporting more negative findings, strictness annotations didn't seem to help whatsoever and the tail recursive version had mixed results. I can add my benchmarks in a pull request, they are not very well-informed but I figure they are something that can be built on.
This is probably more of a question than an issue but is there any reason that
nubOrd
is not tail recursive, did you benchmark it to be faster as written? Also the red black tree implementation uses no strictness annotations, was it tested that they make no difference?