haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
315 stars 178 forks source link

Full IntSets? #999

Open meooow25 opened 6 months ago

meooow25 commented 6 months ago

I wonder if it will be useful to add another constructor to IntSet: Full Prefix. This indicates that this is a full tree with a shared prefix.

I expect this to show good improvements for use cases where contiguous runs of Ints are likely. It will also reduce space usage in such cases. For instance, to store [0..n] we will need only $O(\log n)$ memory instead of $O(n)$.

But this would add small overheads in various places, so it will likely not be a clear win in all use cases.
Also, to be clear, this would be small added costs and not a change in the worst case bounds.

For sure we would want to test something like this out on some real world use cases (like GHC, maybe others?) to see how it affects things. Just documenting this idea for now.

alexfmpe commented 6 months ago

Think this could technically be represented with existing constructors

https://github.com/haskell/containers/blob/c651094ec026b90c4e9d4ed81bc15eb337d3fc2e/containers/src/Data/IntSet/Internal.hs#L262-L263

If the lowest bit in the Mask is set, that means there's two elements in the full sub-tree sharing all but the last bit. If a higher bit in the Mask is set, the lower ones are ignored so maybe one could have

isFullTree = mask .&. 1 == 1
fullTree prefix mask = Bin prefix (mask .|. 1) Tip Tip

This saves a constructor entry at the cost of 2 Tip in the full sub-tree case (which is still better than an actually full tree) but might require more bit fiddling to account for the possibly-set lower bit in the existing operations.

998 would affect the bit fiddling costs of this I guess.

meooow25 commented 6 months ago

Ah sure, it could be done by assigning the case to some currently invalid configuration. But what is the benefit of keeping the number of constructors at 3?

We can even reduce it to 2 today if we want, by replacing Nil with, say, Tip 0 0. Again, I don't know if or why that would be good.

treeowl commented 6 months ago

Replacing Nil with Tip 0 0 is an interesting idea. I wonder how that would perform.

alexfmpe commented 6 months ago

But what is the benefit of keeping the number of constructors at 3?

My thinking was it might decrease the amount of branch prediction needed in some cases, (assuming the bit fiddling doesn't eclipse benefits), but it also depends on what constructor matching gets compiled to and I don't know how ghc behaves there.

meooow25 commented 4 months ago

I found an implementation of this: https://hackage.haskell.org/package/intset

Haven't looked into how it compares performance-wise yet.