haskell / vector

An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework .
Other
366 stars 139 forks source link

Enrich mutable API for vectors #334

Open Shimuuar opened 4 years ago

Shimuuar commented 4 years ago

There were quite a few proposal to enrich API of mutable vectors: #326, #203, #64, #175. Tris issue tries to collect all proposals from them in one nice list (and adds few on top of them)

Folds & lookup

I'm not sure that foldlM_ & friends are worth adding. It's easy to ignore return result of fold anyway: void, _ <- .... It however could be useful to add pure variants:

They come with performance bonus. We can perform entire fold in the ST monad and only lift to m at last moment. This way GHC only have to use bind from ST monad and it's good at optimizing it

In place mutation

Naming could be slightly confusing here. Should in place map be called mapM or mapM_. Underscore usually means throwing away results of function and mapM builds new structure. I'm slightly lean towards mapM since it's in-place map

Creation

Comments? Objections? Proposals?

Bodigrim commented 4 years ago

Looks reasonable, but I would strongly prefer names which at least do not clash with immutable vector functions. Also it is quite unexpected to encounter Foo.map with a signature (a -> a) -> foo -> m ().

In bitvec I use names mapInPlace, zipInPlace, invertInPlace, reverseInPlace, etc.

CC @haskell-core (not sure if I can ping an organization)

Shimuuar commented 4 years ago

RE: name clashes. Only way to avoid them is to add prefixes for all mutable versions. Also there's precedent for name clashes: init, tail, drop, replicate etc clash already.

RE: InPlace I like it. It makes clear that vector is modified in place. It also allows to have functions like mapM :: (a -> m b) -> v s a -> m (v s b) which create new vector without name clashes. On other hand it's a bit too long — 7 letters.

lehins commented 4 years ago

I am :100: in favor of everything in this issue. The only thing I guess we need to reach consensus on is the naming.

I agree InPlace suffix portrays the intention well, and I also agree that it is a bit long. Of course it's ok if we can't come up with something better, but we can try. This in place mutation concept will be shared by many function so maybe it will be enough to add an acronym or a shortened suffix to such functions (a la. M for monadic) and it will be clear enough what it means. For example Mut for mutated? mapMutM, foldlMutM. Or Mod for modified mapModM ...

Shimuuar commented 4 years ago

I'd like to propose following naming conventions.

  1. Functions that don't change vector's elements (like folds) or functions that create new vector (replicate/generate) get same name as pure counterpart.

  2. Functions that modify vector in place get some uniform suffix/prefix/infix. Proposals so far"

    • InPlace suffix: mapInPlace, mapMInPlace
    • Mut mixifx: mapMut, mapMutM
    • Same but Mod

I think that Mut is clearly better that Mod since library already uses mutable in API: type family Mutable etc. I prefer Mut over InPlace as well since it's shorter.

Bodigrim commented 4 years ago

I do not see much benefit in brevity, certainly not enough to sacrifice clarity, to be honest. Mut (even if not mistaken for "mute" or "mutual") does not actually say much about semantics, besides that it somehow deals with mutable vectors or, possibly, mutates or changes something. For instance, nothing in the name mapMut hints at an in-place mutation.

lehins commented 4 years ago

I'd agree with you if InPlace suffix would be used in one or two functions, but we are talking about adding quite a few in five different modules, which means a even single letter would suffice to distinguish the concept. For example you know in that M in mapM stands for Monad, we don't name mapMonadic, right?

That being said, this is not something I care too deeply about, so whatever you guys decide on I'll be fine with. I just prefer shorter names as well thus wanted expressed my opinion on this subject. Most importantly we all agree on the concepts: folds / in place / creation :+1:

Bodigrim commented 4 years ago

I think we can absolutely go ahead with generate / generateM. They are simple, extremely useful and naming is similar to existing replicate / replicateM.

cartazio commented 4 years ago

Inplace Suffix is probably the least confusing

leftaroundabout commented 4 years ago

I feel strongly that something called map or mapM should not perform any in-place updates. As I commented in #326, I would suggest

modifyM :: MVector (PrimState m) a -> (a -> m a) -> Int -> m ()
modifyAll :: MVector (PrimState m) a -> (a -> a) -> m ()
modifyAllM :: MVector (PrimState m) a -> (a -> m a) -> m ()

but an InPlace suffix also seems reasonable.

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all if they always just boil to some pure operation over a freezed MVector. Is there a performance reason we'd want these directly for mutable vectors?

Name clashes I personally don't really mind, since I always import the immutable and mutable modules with different qualifiers, but how universal is this style? I could imagine quite a few packages would break if V.mapM becomes ambiguous.

If we're going to have clashes, then once again: please no different semantics for same-name functions!

cartazio commented 4 years ago

Good points. Or ModifyWith? It’s a tricky bike shed.

On Sun, Oct 25, 2020 at 3:41 PM Justus Sagemüller notifications@github.com wrote:

I feel strongly that something called map or mapM should not perform any in-place updates. As I commented in #326 https://github.com/haskell/vector/issues/326, I would suggest

modifyM :: MVector (PrimState m) a -> (a -> m a) -> Int -> m ()

modifyAll :: MVector (PrimState m) a -> (a -> a) -> m ()

modifyAllM :: MVector (PrimState m) a -> (a -> m a) -> m ()

but an InPlace suffix also seems reasonable.

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all if they always just boil to some pure operation over a freezed MVector. Is there a performance reason we'd want these directly for mutable vectors?

Name clashes I personally don't really mind, since I always import the immutable and mutable modules with different qualifiers, but how universal is this style? I could imagine quite a few packages would break if V.mapM becomes ambiguous.

If we're going to have clashes, then once again: please no different semantics for same-name functions!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/334#issuecomment-716200524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQTH3AXBHH6ZSMULTFTSMR5O3ANCNFSM4SVEIFZA .

lehins commented 4 years ago

I thought about this one as well:

I feel strongly that something called map or mapM should not perform any in-place updates.

... just looked at the linked ticket and that was exactly where I thought about it :smile:

Not sure modify by itself is the right name though, but modifyAll and modifyAllM sound reasonable and correlates nicely with modify function that deals with a single element..

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all ...

If you use unsafeFreeze, then they are indeed not necessary, but the problem is we do not want users to rely on unsafe functions. Let me outline the actual problems with relying on freezing into pure vectors with examples.

This is what we want:

mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  MV.mapM_ putStrLn mvec
  MV.write mvec 1 "bar"
  MV.mapM_ putStrLn mvec

Above function is something we would like the user be able to write and not worry about issues that come along with conversion to immutable vectors:

mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  vec1 <- V.freeze mvec
  V.mapM_ putStrLn vec1
  write mvec 1 "bar"
  vec2 <- V.freeze mvec -- second copy of the same vector, ouch
  V.mapM_ putStrLn vec2
mutableVectorOps mvec = do
  MV.write mvec 0 "foo"
  vec <- V.unsafeFreeze mvec
  V.mapM_ putStrLn vec
  write mvec 1 "bar"
  V.mapM_ putStrLn vec -- immutable vector was mutated, ouch
cartazio commented 4 years ago

Yeah, hand writing the unsafe freeze or thaw operations without a lotta care results in terrible bugs in the presence of optimization. So having fp style analogous operations provided for users makes sense

On Sun, Oct 25, 2020 at 4:08 PM Alexey Kuleshevich notifications@github.com wrote:

I thought about this one as well:

I feel strongly that something called map or mapM should not perform any in-place updates.

... just looked at the linked ticket and that was exactly where I thought about it 😄

Not sure modify by itself is the right name though, but modifyAll and modifyAllM sound reasonable and correlates nicely with modify https://hackage.haskell.org/package/vector-0.12.1.2/docs/Data-Vector-Mutable.html#v:modify function that deals with a single element..

For the suggested functions that don't perform mutation – well, I'm not completely convinced these are necessary at all ...

If you use unsafeFreeze, then they are indeed not necessary, but the problem is we do not want users to rely on unsafe functions. Let me outline the actual problems with relying on freezing into pure vectors with examples.

This is what we want:

mutableVectorOps mvec = do

MV.write mvec 0 "foo"

MV.mapM_ putStrLn mvec

MV.write mvec 1 "bar"

MV.mapM_ putStrLn mvec

Above function is something we would like the user be able to write and not worry about issues that come along with conversion to immutable vectors:

  • redundant copies:

mutableVectorOps mvec = do

MV.write mvec 0 "foo"

vec1 <- V.freeze mvec

V.mapM_ putStrLn vec1

write mvec 1 "bar"

vec2 <- V.freeze mvec -- second copy of the same vector, ouch

V.mapM_ putStrLn vec2

  • unsafety of operations when no-copy approach is used

mutableVectorOps mvec = do

MV.write mvec 0 "foo"

vec <- V.unafeFreeze mvec

V.mapM_ putStrLn vec

write mvec 1 "bar"

V.mapM_ putStrLn vec -- immutable vector was mutated, ouch

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/334#issuecomment-716204334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQUXF7X2XOLPWPLBA7LSMSAUPANCNFSM4SVEIFZA .

Shimuuar commented 4 years ago

I've added PR with what I think uncontroversial additions. Now there's question whether right folds should be included. We can't use laziness to abort computation early and have to read all elements anyway

konsumlamm commented 3 years ago

I suggest you pin this issue instead of #203, which was closed in favor of this.

lehins commented 3 years ago

@konsumlamm Thanks, done.

konsumlamm commented 3 years ago

I just had a look at the different ways to construct mutable and immutable vectors and i noticed that mutable vectors have very few. It is proposed to add generate/generateM (which i have no problem with), but there are still a lot of other construction functions one could add (iterateN/iterateNM, unfoldr/unfoldrM, unfoldrN/unfoldrNM, enumFromN, enumFromStepN, ...).

We could just port these as well, though that would mean writing a lot of (possibly unnecessary) code. It should be possible to implement all the mutable variants with unsafeThaw (I'm not sure if that would be less efficient than implementing them manually, since the immutable versions all do streaming). However that probably shouldn't be encouraged, since it is (as the name says) unsafe, if used wrongly.

Another thought I had was implementing a safe O(1) thaw that uses linear types to prevent the user from using the immutable vector afterwards. Although that may be undesirable, since it requires a new language extension.

Bodigrim commented 3 years ago

Not only it requires a new language extension, but it also makes this API available to GHC 9.0+ users only. I think it is too early to adopt linear types in vector.

konsumlamm commented 3 years ago

I noticed that there already is an in-place reverse in Data.Vector.Generic.Mutable, but it is marked as an internal operation. I don't see a reason why this shouldn't be a normal operation and be reexported in the specialized modules.

Boarders commented 3 years ago

What is the status of this work? Is there anything I can do to help it along? I am constantly frustrated when working with mutable vectors that I don't have map or traverse or similar functions and need to write basic recursions by hand. Would be great to see some movement here.

lehins commented 3 years ago

@Boarders, some of the functions have already been implemented in #338 , so there is no need to resort to manual loops. For example, generateM is already available, thus traverse f mv = generateM (length mv) (unsafeRead mv >=> f)

That being said, PRs are welcome :wink: