haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
315 stars 178 forks source link

Improve Ord instances for IntSet and IntMap #470

Open treeowl opened 6 years ago

treeowl commented 6 years ago

The current Ord instances for IntSet and IntMap don't take advantage of the way these types are structured at all. We may therefore be leaving some performance on the table. In particular:

  1. I don't think we should need to convert the maps to lists for comparison.
  2. We can recognize certain special arrangements of keys, prefixes, and perhaps masks to avoid following a bunch of pointers for nothing. For example, if one map has a negative prefix and the other has a (strictly) positive prefix, we are done.
Zemyla commented 5 years ago

If nothing else, we should have use lists with unpacked Ints.

data KeyList
  = LEnd
  | LCons {-# UNPACK #-} !Key KeyList
  deriving (Eq, Ord)

data KeyPairList a
  = PEnd
  | PCons {-# UNPACK #-} !Key a (KeyPairList a)
  deriving (Eq, Ord)

And then you have pre-inlined functions to turn an IntSet into a KeyList and an IntMap into a KeyPairList.

Though what would probably be even better, at least for the KeyList case, would be:

data KeyList
  = LEnd
  | LCons {-# UNPACK #-} !Prefix {-# UNPACK #-} !BitMap KeyList
  deriving (Eq)

instance Ord KeyList where
  compare LEnd LEnd = EQ
  compare LEnd _ = LT
  compare (LCons prea bita la) (LCons preb bitb lb)
    | prea < preb = LT
    | prea > preb = GT
    | otherwise = case compare (revNat bita) (revNat bitb) of
        EQ -> compare la lb
        c  -> c

This way, there's only one entry per Tip node in each IntSet. This sort of thing would probably also speed up Show IntSet and Show IntMap as well.

EDIT: However, for those, the best bet may be just rewriting it so that the fold over the container constructs the desired string directly, such as:

instance Show IntSet where
  showsPrec p is = showParen (p > 10) $ \str -> showString "fromList [" $ either id id $ foldr showOne (Left (']':str)) is where
    showOne !i r = Right $ shows i $ either id ((:) ',') r

I'm pretty sure the definition for showList @Int isn't going to change from the default any time in the near future, is it? Similarly, because the liftShowWith2 instance for (,) is the default as well, we can inline the Show instance for IntMap the same way.

instance Show a => Show (IntMap a) where
  showsPrec p im = showParen (p > 10) $ \s -> showString "fromList [" $ either id id $ foldrWithKey showOne (Left (']':s)) im where
    showOne !i a r = Right $ showChar '(' $ shows i $ showChar ',' $ shows a $ showChar ')' $ either id ((:) ',') r

This means that we don't have to worry about creating and then destroying intermediate lists, boxed Ints, and tuples.

treeowl commented 5 years ago

I'm not so sure about revNat being the right approach (it's kind of heavy). I think something like this might work:

instance Ord KeyList where
  compare LEnd LEnd = EQ
  compare LEnd _ = LT
  compare (LCons prea bita la) (LCons preb bitb lb)
    | prea < preb = LT
    | prea > preb = GT
    | lbm <- lowestBitMask (bita `xor` bitb)
    = compare (bita && lbm) (bitb && lbm) <> compare la lb

Still, it would be nice to use the IntMap/IntSet structure a bit...

jwaldmann commented 5 years ago

Benchmarks? Ideally, derived from actual use cases.

Where do we need Ord on Sets? When we take a Set (or Map) of sets. As in the power-set construction, for making deterministic automata?

I have something at https://gitlab.imn.htwk-leipzig.de/autotool/all0/blob/master/fa/src/Autolib/NFA/Det.hs but it needs to be extracted before being useful.

jwaldmann commented 5 years ago

stand-alone NFA determinisation in this branch: https://github.com/jwaldmann/containers/commits/instance-Ord-IntSet

Example profile - see below. I think it shows that indeed, Data.IntSet.Internal.compare is exercised here.

        Fri Jul 19 00:41 2019 Time and Allocation Profiling Report  (Final)

           ord-intset-benchmarks +RTS -P -RTS -q det/hard/n=16

        total time  =        7.55 secs   (7550 ticks @ 1000 us, 1 processor)
        total alloc = 13,662,565,528 bytes  (excludes profiling overheads)

COST CENTRE        MODULE                            SRC                                                %time %alloc  ticks     bytes

shiftRL            Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:100:1-22   27.2   18.1   2055 2470760640
revNat             Data.IntSet.Internal              src/Data/IntSet/Internal.hs:(1464,1)-(1469,98)      26.1   33.4   1971 4568460432
toAscList          Data.IntSet.Internal              src/Data/IntSet/Internal.hs:1016:1-24               18.1   37.1   1365 5068974496
shiftLL            Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:101:1-22   15.4    7.7   1162 1055309648
compare            Data.IntSet.Internal              src/Data/IntSet/Internal.hs:1160:5-57                6.4    0.0    483        16
findWithDefault.go Data.IntMap.Internal              src/Data/IntMap/Internal.hs:(625,5)-(630,16)         1.4    0.0    107         0
det.go.ts.next     Main                              benchmarks/OrdIntSet.hs:(64,26)-(66,72)              0.5    1.0     34 138936336

But - it really is never a good idea to use Set IntSet (or similar). It's at least as expensive as Set [Int].

NB: I wonder how poeople do implement the powerset construction (from NFA to DFA). Perhaps the answer is "they don't" (and go from regular expression to DFA directly).

So, are there other applications where one needs an (optimized) instance Ord IntSet?

jwaldmann commented 5 years ago

I am trying this https://github.com/jwaldmann/containers/blob/instance-Ord-IntSet/containers-tests/benchmarks/OrdIntSet.hs#L76

It seems to be working for natural numbers, not for negative keys, since they break the assumption that in Bin p m l r, all the keys in l are smaller than all in r.

*Main> Bin p m l r = fromList [-1,0]
*Main> (l,r)
(fromList [0],fromList [-1])

But these cases should be easy to detect? I could use some ideas here (also for relateBM) @treeowl @int-e

jwaldmann commented 5 years ago

Yes! I get a nice speed-up (around 3), and much reduced allocation.

treeowl commented 5 years ago

Lovely! In principle, the negative weirdness should be limited to the roots. If you're having trouble, I can try to help, but you likely know at least as much about it as I do!

On Sun, Jul 21, 2019, 11:34 AM jwaldmann notifications@github.com wrote:

Yes! I get a nice speed-up (around 3), and much reduced allocation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7IIM63Y64366FEGRG3QAR6XVA5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OF5YQ#issuecomment-513564386, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7KDIPW5O4LYDQ3ZPCDQAR6XVANCNFSM4EJ75JBA .

jwaldmann commented 5 years ago

In my benchmarks (NFA -> DFA) I think the IntSet always fits into one Tip. I will spread out the numbers and see what happens there.

jwaldmann commented 5 years ago

I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly

I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that.

treeowl commented 5 years ago

Can you put together a pull request? Or better, two: make the first one add your benchmark.

On Mon, Jul 22, 2019, 9:56 AM jwaldmann notifications@github.com wrote:

I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly

  • for small trees (Tip only): 30 percent runtime
  • for large trees (each Tip is singleton): 70 percent runtime (both versions walk the tree in the same way but I don't do any allocation)

I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7MKRFNKHYIVC35C7CDQAW4A5A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2P75TI#issuecomment-513801933, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7NWOSLIKZ5X5FXZ5XLQAW4A5ANCNFSM4EJ75JBA .

jwaldmann commented 5 years ago

actually it's three parts

treeowl commented 5 years ago

I didn't realize there wasn't a correctness test yet! Ouch!

On Mon, Jul 22, 2019, 10:10 AM jwaldmann notifications@github.com wrote:

actually it's three parts

  • implementation
  • test for correctness (compare s t == compare (toAscList s) (toAscList t))
  • benchmark

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7PSSISZCNJSVSDBSMTQAW5T3A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QBJ7Y#issuecomment-513807615, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7MUSK2JERMIJEZ2SN3QAW5T3ANCNFSM4EJ75JBA .

jwaldmann commented 5 years ago

Are these properties (that Eq and Ord of IntSet go via toAscList) announced anywhere? If not, that deserves a separate discussion?

jwaldmann commented 5 years ago

Oh I see now that intset-properties.hs already contains

prop_ord :: IntSet -> IntSet -> Bool
prop_ord s1 s2 = s1 `compare` s2 == toList s1 `compare` toList s2

but it's hidden deep down the file. I'd expected it to go at the top (following the order of the declarations: data type, instances, operations).

And it really should be toAscList.