Open treeowl opened 6 years ago
If nothing else, we should have use lists with unpacked Int
s.
data KeyList
= LEnd
| LCons {-# UNPACK #-} !Key KeyList
deriving (Eq, Ord)
data KeyPairList a
= PEnd
| PCons {-# UNPACK #-} !Key a (KeyPairList a)
deriving (Eq, Ord)
And then you have pre-inlined functions to turn an IntSet
into a KeyList
and an IntMap
into a KeyPairList
.
Though what would probably be even better, at least for the KeyList
case, would be:
data KeyList
= LEnd
| LCons {-# UNPACK #-} !Prefix {-# UNPACK #-} !BitMap KeyList
deriving (Eq)
instance Ord KeyList where
compare LEnd LEnd = EQ
compare LEnd _ = LT
compare (LCons prea bita la) (LCons preb bitb lb)
| prea < preb = LT
| prea > preb = GT
| otherwise = case compare (revNat bita) (revNat bitb) of
EQ -> compare la lb
c -> c
This way, there's only one entry per Tip
node in each IntSet
. This sort of thing would probably also speed up Show IntSet
and Show IntMap
as well.
EDIT: However, for those, the best bet may be just rewriting it so that the fold over the container constructs the desired string directly, such as:
instance Show IntSet where
showsPrec p is = showParen (p > 10) $ \str -> showString "fromList [" $ either id id $ foldr showOne (Left (']':str)) is where
showOne !i r = Right $ shows i $ either id ((:) ',') r
I'm pretty sure the definition for showList @Int
isn't going to change from the default any time in the near future, is it? Similarly, because the liftShowWith2
instance for (,)
is the default as well, we can inline the Show
instance for IntMap
the same way.
instance Show a => Show (IntMap a) where
showsPrec p im = showParen (p > 10) $ \s -> showString "fromList [" $ either id id $ foldrWithKey showOne (Left (']':s)) im where
showOne !i a r = Right $ showChar '(' $ shows i $ showChar ',' $ shows a $ showChar ')' $ either id ((:) ',') r
This means that we don't have to worry about creating and then destroying intermediate lists, boxed Int
s, and tuples.
I'm not so sure about revNat
being the right approach (it's kind of heavy). I think something like this might work:
instance Ord KeyList where
compare LEnd LEnd = EQ
compare LEnd _ = LT
compare (LCons prea bita la) (LCons preb bitb lb)
| prea < preb = LT
| prea > preb = GT
| lbm <- lowestBitMask (bita `xor` bitb)
= compare (bita && lbm) (bitb && lbm) <> compare la lb
Still, it would be nice to use the IntMap
/IntSet
structure a bit...
Benchmarks? Ideally, derived from actual use cases.
Where do we need Ord on Sets? When we take a Set (or Map) of sets. As in the power-set construction, for making deterministic automata?
I have something at https://gitlab.imn.htwk-leipzig.de/autotool/all0/blob/master/fa/src/Autolib/NFA/Det.hs but it needs to be extracted before being useful.
stand-alone NFA determinisation in this branch: https://github.com/jwaldmann/containers/commits/instance-Ord-IntSet
Example profile - see below. I think it shows that indeed, Data.IntSet.Internal.compare is exercised here.
Fri Jul 19 00:41 2019 Time and Allocation Profiling Report (Final)
ord-intset-benchmarks +RTS -P -RTS -q det/hard/n=16
total time = 7.55 secs (7550 ticks @ 1000 us, 1 processor)
total alloc = 13,662,565,528 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc ticks bytes
shiftRL Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:100:1-22 27.2 18.1 2055 2470760640
revNat Data.IntSet.Internal src/Data/IntSet/Internal.hs:(1464,1)-(1469,98) 26.1 33.4 1971 4568460432
toAscList Data.IntSet.Internal src/Data/IntSet/Internal.hs:1016:1-24 18.1 37.1 1365 5068974496
shiftLL Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:101:1-22 15.4 7.7 1162 1055309648
compare Data.IntSet.Internal src/Data/IntSet/Internal.hs:1160:5-57 6.4 0.0 483 16
findWithDefault.go Data.IntMap.Internal src/Data/IntMap/Internal.hs:(625,5)-(630,16) 1.4 0.0 107 0
det.go.ts.next Main benchmarks/OrdIntSet.hs:(64,26)-(66,72) 0.5 1.0 34 138936336
But - it really is never a good idea to use Set IntSet
(or similar). It's at least as expensive as Set [Int]
.
NB: I wonder how poeople do implement the powerset construction (from NFA to DFA). Perhaps the answer is "they don't" (and go from regular expression to DFA directly).
So, are there other applications where one needs an (optimized) instance Ord IntSet
?
I am trying this https://github.com/jwaldmann/containers/blob/instance-Ord-IntSet/containers-tests/benchmarks/OrdIntSet.hs#L76
It seems to be working for natural numbers, not for negative keys, since they break the assumption that in Bin p m l r
, all the keys in l
are smaller than all in r
.
*Main> Bin p m l r = fromList [-1,0]
*Main> (l,r)
(fromList [0],fromList [-1])
But these cases should be easy to detect? I could use some ideas here (also for relateBM
) @treeowl @int-e
Yes! I get a nice speed-up (around 3), and much reduced allocation.
Lovely! In principle, the negative weirdness should be limited to the roots. If you're having trouble, I can try to help, but you likely know at least as much about it as I do!
On Sun, Jul 21, 2019, 11:34 AM jwaldmann notifications@github.com wrote:
Yes! I get a nice speed-up (around 3), and much reduced allocation.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7IIM63Y64366FEGRG3QAR6XVA5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OF5YQ#issuecomment-513564386, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7KDIPW5O4LYDQ3ZPCDQAR6XVANCNFSM4EJ75JBA .
In my benchmarks (NFA -> DFA) I think the IntSet always fits into one Tip
. I will spread out the numbers and see what happens there.
I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly
I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that.
Can you put together a pull request? Or better, two: make the first one add your benchmark.
On Mon, Jul 22, 2019, 9:56 AM jwaldmann notifications@github.com wrote:
I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly
- for small trees (Tip only): 30 percent runtime
- for large trees (each Tip is singleton): 70 percent runtime (both versions walk the tree in the same way but I don't do any allocation)
I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7MKRFNKHYIVC35C7CDQAW4A5A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2P75TI#issuecomment-513801933, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7NWOSLIKZ5X5FXZ5XLQAW4A5ANCNFSM4EJ75JBA .
actually it's three parts
compare s t == compare (toAscList s) (toAscList t)
)I didn't realize there wasn't a correctness test yet! Ouch!
On Mon, Jul 22, 2019, 10:10 AM jwaldmann notifications@github.com wrote:
actually it's three parts
- implementation
- test for correctness (compare s t == compare (toAscList s) (toAscList t))
- benchmark
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/470?email_source=notifications&email_token=AAOOF7PSSISZCNJSVSDBSMTQAW5T3A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QBJ7Y#issuecomment-513807615, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7MUSK2JERMIJEZ2SN3QAW5T3ANCNFSM4EJ75JBA .
Are these properties (that Eq and Ord of IntSet go via toAscList) announced anywhere? If not, that deserves a separate discussion?
Oh I see now that intset-properties.hs
already contains
prop_ord :: IntSet -> IntSet -> Bool
prop_ord s1 s2 = s1 `compare` s2 == toList s1 `compare` toList s2
but it's hidden deep down the file. I'd expected it to go at the top (following the order of the declarations: data type, instances, operations).
And it really should be toAscList
.
The current
Ord
instances forIntSet
andIntMap
don't take advantage of the way these types are structured at all. We may therefore be leaving some performance on the table. In particular: