haskell / vector

An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework .
Other
366 stars 139 forks source link

Possible Changes to Bundle.Chunk? #91

Open Zemyla opened 9 years ago

Zemyla commented 9 years ago

As it is, the Chunk type in Data.Vector.Fusion.Bundle.Monadic is much less useful than it could be.Here are some ways, ranked by desirability and compatibility-breaking, on how it could be better.

1) Add a Monoid instance and utility functions for Chunk. This is the simplest change, and one that breaks the fewest programs. The most general utility functions would probably be toChunk :: (Vector v a) => v a -> Chunk v a and fromChunk :: (Vector v a) => Chunk v a -> v a, and possibly singleton :: (Vector v a) => a -> Chunk v a. This would let Chunks be used like ByteString Builders, as a low-overhead, O(1) concatenation option.

The definition for Monoid would be:

import qualified Data.Vector.Generic.Mutable as M
instance Monoid (Chunk v a) where
    mempty = Chunk 0 (const $ return ())
    mappend (Chunk na wa) (Chunk nb wb) = Chunk (na + nb) (\v -> wa (M.take na v) >> wb (M.drop na v))

2) Change the type of Chunk. This would involve a Yoneda transform on Chunk's type:

data Chunk v a = Chunk Int (forall m r. (PrimMonad m, Vector v r) => Mutable v (PrimState m) r -> (a -> r) -> m ())

This would turn Chunk into a Functor, meaning the Functor instance for Bundle could be a lot simpler and involve a lot less copying. Also, this prevents badly-behaved Chunks from reading and mutating the MVector they are given.

3) Change the kind of Chunk. If you generalize, then you can make Chunks that can construct any type of Vector:

data Chunk a = Chunk Int (forall m v r. (PrimMonad m, Vector v r) => Mutable v (PrimState m) r -> (a -> r) -> m ())

In addition to making Chunks able to produce any kind of Vector, this also admits an Applicative instance for Chunk!

import qualified Data.Vector.Generic.Mutable as M
import qualified Data.Vector.Mutable as MV
instance Applicative Chunk where
    pure a = Chunk 1 (\v c -> M.write v 0 (c a))
    (Chunk nf wf) <*> (Chunk na wa) = Chunk (nf * na) $ \v c -> do
        vf <- MV.new nf
        wf vf id
        let loop n = if n >= nf then return () else do
            f <- MV.read vf n
            wa (M.slice (n * na) na v) (c . f)
            loop (n + 1)
        loop 0
    (Chunk na _) *> (Chunk nb wb) = Chunk (na * nb) $ \v c -> do
        let loop n = if n >= na then return () else wb (M.slice (n * nb) nb) c >> loop (n + 1)
        loop 0

This also makes it an Alternative, with empty and <|> defined as mempty and mappend, respectively.

If these changes are made, then it would probably be a good idea to move Chunk from its current location and give it a dedicated import, like Data.Vector.Chunk, for use in building Vectors cheaply.

So is it possible to make these changes and make Chunk more than just a utility type for Bundle?

cartazio commented 9 years ago

These are swell ideas! But the first step is to prototype them and benchmark how they respectively perform, right? At some level, all the internal data types in vector are in the service of easier performance, so such changes should also be validated with respect to performance implications right?

On Thursday, August 6, 2015, Zemyla notifications@github.com wrote:

As it is, the Chunk type in Data.Vector.Fusion.Bundle.Monadic is much less useful than it could be.Here are some ways, ranked by desirability and compatibility-breaking, on how it could be better.

1) Add a Monoid instance and utility functions for Chunk. This is the simplest change, and one that breaks the fewest programs. The most general utility functions would probably be toChunk :: (Vector v a) => v a -> Chunk v a and fromChunk :: (Vector v a) => Chunk v a -> v a, and possibly singleton :: (Vector v a) => a -> Chunk v a. This would let Chunks be used like ByteString Builders, as a low-overhead, O(1) concatenation option.

The definition for Monoid would be: import qualified Data.Vector.Generic.Mutable as M instance Monoid (Chunk v a) where mempty = Chunk 0 (const $ return ()) mappend (Chunk na wa) (Chunk nb wb) = Chunk (na + nb) (\v -> wa (M.take na v) >> wb (M.drop na v))

2) Change the type of Chunk. This would involve a Yoneda transform on Chunk's type: data Chunk v a = Chunk Int (forall m r. (PrimMonad m, Vector v r) => Mutable v (PrimState m) r -> (a -> r) -> m ()) This would turn Chunk into a Functor, meaning the Functor instance for Bundle could be a lot simpler and involve a lot less copying. Also, this prevents badly-behaved Chunks from reading and mutating the MVector they are given.

3) Change the kind of Chunk. If you generalize, then you can make Chunks that can construct any type of Vector: data Chunk a = Chunk Int (forall m v r. (PrimMonad m, Vector v r) => Mutable v (PrimState m) r -> (a -> r) -> m ()) In addition to making Chunks able to produce any kind of Vector, this also admits an Applicative instance for Chunk!

import qualified Data.Vector.Generic.Mutable as M import qualified Data.Vector.Mutable as MV

instance Applicative Chunk where pure a = Chunk 1 (\v c -> M.write v 0 (c a)) (Chunk nf wf) <> (Chunk na wa) = Chunk (nf * na) $ \v c -> do vf <- MV.new nf wf vf id let loop n = if n >= nf then return () else do f <- MV.read vf n wa (M.slice (n * na) na v) (c . f) loop (n + 1) loop 0 (Chunk na _) > (Chunk nb wb) = Chunk (na * nb) $ \v c -> do let loop n = if n >= na then return () else wb (M.slice (n * nb) nb) c >> loop (n + 1) loop 0

This also makes it an Alternative, with empty and <|> defined as mempty and mappend, respectively.

If these changes are made, then it would probably be a good idea to move Chunk from its current location and give it a dedicated import, like Data.Vector.Chunk, for use in building Vectors cheaply.

So is it possible to make these changes and make Chunk more than just a utility type for Bundle?

— Reply to this email directly or view it on GitHub https://github.com/haskell/vector/issues/91.

cartazio commented 8 years ago

have you done any experiments for this?

Zemyla commented 8 years ago

Unfortunately, I was working on it, but then my laptop died. I'm sending this by phone.

cartazio commented 8 years ago

@Zemyla well, if you can resketch this out when you have the time, please share your notes or some resurrected ideas/code, that would be appreciated :)