haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
316 stars 178 forks source link

Lower Seq by a level #406

Open Zemyla opened 7 years ago

Zemyla commented 7 years ago

It has occurred to me that a Node (Elem a) is wasting quite a bit of space, since it stores an Int as well as the three pointers to newtypes. The newtype noise is often distracting as well. It seems to me it would be simpler to have Elem be like a first-level Node:

data Elem a
  = Elem2 a a
  | Elem3 a a a

instance Sized (Elem a) where
  size (Elem2 {}) = 2
  size (Elem3 {}) = 3

Have a similar definition for top-level Digits:

data DigitS a
  = OneS a
  | TwoS a a
  | ThreeS a a a
  | FourS a a a a

instance Sized (DigitS a) where
  size (OneS {}) = 1
  size (TwoS {}) = 2
  size (ThreeS {}) = 3
  size (FourS {}) = 4

And then, at the top level, Seq is no longer a newtype:

data Seq a
  = EmptyS
  | SingleS a
  | DeepS {-# UNPACK #-} !Int !(DigitS a) !(FingerTree (Elem a)) !(DigitS a)

You can probably use regular Digits instead of DigitSs if you remember to use a specific top-level digitSize function instead of just regular size.

This means that getting the sizes of Nodes no longer involves adding a bunch of 1s together; it means that functions operating on Seqs (for instance, (|>)) can be inlined; you don't need to worry about coerce not working on older versions of GHC; and several other benefits.

The question, then, is if it's worth the increase in code size and complexity.

treeowl commented 7 years ago

Once upon a time, I'd have said it's not worth it. However, I've ended up adding a level anyway in a lot of places for performance, so yeah, we should consider it. Want to take a stab?

On Feb 18, 2017 6:47 PM, "Zemyla" notifications@github.com wrote:

It has occurred to me that a Node (Elem a) is wasting quite a bit of space, since it stores an Int as well as the three pointers to newtypes. The newtype noise is often distracting as well. It seems to me it would be simpler to have Elem be like a first-level Node:

data Elem a = Elem2 a a | Elem3 a a a instance Sized (Elem a) where size (Elem2 {}) = 2 size (Elem3 {}) = 3

Have a similar definition for top-level Digits:

data DigitS a = OneS a | TwoS a a | ThreeS a a a | FourS a a a a instance Sized (DigitS a) where size (OneS {}) = 1 size (TwoS {}) = 2 size (ThreeS {}) = 3 size (FourS {}) = 4

And then, at the top level, Seq is no longer a newtype:

data Seq a = EmptyS | SingleS a | DeepS {-# UNPACK #-} !Int !(DigitS a) !(FingerTree (Elem a)) !(DigitS a)

You can probably use regular Digits instead of DigitSs if you remember to use a specific top-level digitSize function instead of just regular size.

This means that getting the sizes of Nodes no longer involves adding a bunch of 1s together; it means that functions operating on Seqs (for instance, (|>)) can be inlined; you don't need to worry about coerce not working on older versions of GHC; and several other benefits.

The question, then, is if it's worth the increase in code size and complexity.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/406, or mute the thread https://github.com/notifications/unsubscribe-auth/ABzi_fcBo6YFoESYweDXy3-iSA0ch-tWks5rd4MWgaJpZM4MFTKm .

treeowl commented 5 years ago

@Zemyla, I believe we also have the option of using Elem in place of DigitS. As I recall, the Hinze-Paterson paper indicates that the extra flexibility isn't necessary at the top level (for the operations they describe). I'm not sure whether we rely on it for any other things we do (we'd have to check inits and tails in particular). Efficiency will be improved in some cases but worsened in others.

treeowl commented 5 years ago

My last comment was confused. I think it may be possible to use only 1 and 2 at the top (inefficiently), or 1, 2, and 3, but definitely not just 2 and 3. Whoops. I've made a go at trying out your idea, and it's ... rather annoying. There's an awful lot of code duplication, and I don't know how to fix that. Some things (like traverse) are fine. Others, like <|, are a pain.

treeowl commented 5 years ago

Side note: if we used GADTs (which containers has never done), we could unify digits with nodes.

data NP = Dig | Nod
data Node (np :: NP) a where
  Node1 :: !Int -> a -> Node 'Dig a
  Node2 :: !Int -> a -> a -> Node np a
  Node3 :: !Int -> a -> a -> a -> Node np a
  Node4 :: !Int -> a -> a -> a -> a -> Node 'Dig a

The same would work to unify your Elem with DigitS.

I imagine this would improve performance a bit. In particular, when removing the first/last element would leave an empty digit, we could replace it with the Elem from below, avoiding a bit of allocation.

Zemyla commented 5 years ago

I had two thoughts. First, if we do this, then we should rename the top-level patterns for Seq to Empty and Single, and export them, because allowing users to create or pattern-match on them doesn't break any invariants or expose any hidden details.

Secondly, we could use SmallArrays wrapped in newtypes (or Arrays for versions of GHC which don't have small ones) for Elems and Digits. This would reduce the amount of boilerplate that goes into matching on and folding, mapping, or traversing over them. For instance, if we had (using the syntax from primitive):

newtype Elem a = Elem { unElem :: SmallArray a }
  deriving (Functor, Foldable, Traversable)

newtype Digit a = Digit { unDigit :: SmallArray a }
  deriving (Functor, Foldable, Traversable)

data Seq a
  = Empty
  | Single a
  | Deep {-# UNPACK #-} !Int {-# UNPACK #-} !(Digit a) (FingerTree (Elem a)) {-# UNPACK #-} !(Digit a)
  deriving (Functor, Foldable, Traversable)

then the cons operation looks something like this:

(<|) :: a -> Seq a -> Seq a
a <| Empty = Single a
a <| Single b = runST $ do
  mpr <- newSmallArray 1 a
  msf <- newSmallArray 1 b
  apr <- unsafeFreezeSmallArray mpr
  asf <- unsafeFreezeSmallArray msf
  pure $ Seq 2 (Digit apr) EmptyT (Digit asf)
a <| Deep n (Digit apr) m sf
  | psz >= 4 = seq m $ runST $ do
      mpr' <- newSmallArray (psz - 2) a
      copySmallArray mpr' 1 apr 0 (psz - 3)
      apr' <- unsafeFreezeSmallArray mpr'
      mn <- newSmallArray 3 a
      copySmallArray mn 0 apr (psz - 3) 3
      an <- unsafeFreezeSmallArray mn
      pure $ Deep (n + 1) (Digit apr') (consF (Elem an) m) sf
  | otherwise = runST $ do
      mpr' <- newSmallArray (psz + 1) a
      copySmallArray mpr' 1 apr 0 psz
      apr' <- unsafeFreezeSmallArray mpr'
      pure $ Deep (n + 1) (Digit apr') m sf
  where psz = sizeofSmallArray apr

As you can see, there's only one branch involved in the Deep case, rather than 3 as wih the old-style Digits. Depending on how fast newSmallArray and copySmallArray are for small arguments, this could result in a net speed gain.

Taking things a bit farther, you could even have the second level digits be ArrayArrays which hold 1-4 SmallArrays, and the second level nodes would look like

data NodeSecond a
  = NodeS2 {-# UNPACK #-} !Int {-# UNPACK #-} !(Elem a) {-# UNPACK #-} !(Elem a)
  | NodeS3 {-# UNPACK #-} !Int {-# UNPACK #-} !(Elem a) {-# UNPACK #-} !(Elem a) {-# UNPACK #-} !(Elem a)
  deriving (Functor, Foldable, Traversable)