Open infinity0 opened 3 years ago
This would fix most of https://github.com/haskell/vector/issues/334. The discussion about naming convention is also relevant, afaik it was the reason these functions weren't added at first.
I'm in favor of InPlace
suffix.
IMO the suffix is unnecessary because the type signature already communicates mutable behaviour - it takes in a (a -> m a)
and only one data strucure and returns a m ()
, it is analogous to the existing Mutable.mapM_
To give a bit more detail: my perspective is shaped by the fact you can convert immutable and mutable operations between each other generically (e.g. using https://hackage.haskell.org/package/mutable-lens), so the immutable and mutable mapM (traverse) operations are the same thing; the fact that one version modifies in-place whereas another version returns a copy, is a property of the data structure, not the operation. Here we are defining the operation on Vector.Generic.Mutable already, there is no need to add InPlace
at the end of everything.
In case one argues "it's confusing when something modifies in-place when one expects it not to"; but it's also confusing when something does not modify when one expects it to, which is why I wrote this PR the first place - I had naively expected mapM_
to modify due to the type signature taking a -> m b
rather than a -> m ()
. It's too late to change that now, but IMO documenting it would also have been sufficient (at least I personally do read all such docs when in doubt).
Therefore IMO it is sufficient to document that Mutable.mapM
modifies, instead of polluting the function name. It is really the only possible thing that can naturally happen on a mutable data structure for an operation of that type, and the mapM
vs mapM_
difference is the same for both the immutable and mutable versions of it.
I had naively expected
mapM_
to modify due to the type signature takinga -> m b
rather thana -> m ()
. It's too late to change that now, but IMO documenting it would also have been sufficient (at least I personally do read all such docs when in doubt).
Fwiw, the reason for that is so that people don't necessarily have to add void $
for discarding the result. This is also what the standard mapM_
(from base
) does. The documentation even says "Apply the monadic action to every element of the vector, discarding the results.", which is pretty clear imo (but feel free to improve the documentation).
Therefore IMO it is sufficient to document that
Mutable.mapM
modifies, instead of polluting the function name. It is really the only possible thing that can naturally happen on a mutable data structure for an operation of that type, and themapM
vsmapM_
difference is the same for both the immutable and mutable versions of it.
Well, it could also return a new vector (which might even be preferrable sometimes, since it can have a different element type, like for the standard mapM
), but I see your point. I personally would be fine with mapM
, but also with mapInPlaceM
(or whatever the exact name is supposed to be).
There are 3 such functions we can have for mutable vectors:
Iterate over the vector, apply monadic action to each element, while keeping the original mutable vector intact, unless the action itself modifies it. (already exists and matches common expectations of mapM_
)
mapM_ :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m ()
Iterate over the vector, apply monadic action to each element, result is loaded into a new mutable vector (this is what normally is expected of mapM
)
mapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m (MVector (PrimState m) b)
Iterate over the vector, apply monadic action to each element and replace the original value the the result of that action
mapInPlaceM :: PrimMonad m => (a -> m a) -> MVector (PrimState m) a -> m ()
@infinity0 we need some new suffix in order to disambiguate. None of the common mapM
/mapM_
mutate the actual structure they are applied to.
I think in the context of Haskell naming conventions having mapM
mutate the structure would be really confusing. In general if an iterator is going to mutate what it is passed there should be some large red flag letting you know (in rust you would have to explicitly label that you are borrowing mutably, I don't think haskell should be in a worse off spot).
I want to re-emphasise my point above that you can convert immutable and mutable operations between each other generically (e.g. using https://hackage.haskell.org/package/mutable-lens),
In other words, mapM on an immutable data structure with type signature (a -> m b) -> t a -> m (t b)
is directly isomorphic (when specialised to b = a
) to mapM on a mutable data structure with type signature (a -> m a) -> Mutable t a -> m ()
.
Iterate over the vector, apply monadic action to each element, result is loaded into a new mutable vector (this is what normally is expected of mapM)
mapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m (MVector (PrimState m) b)
This does not make sense to be called mapM
because the monad m
by virtue of the PrimMonad m
constraint already contains an encoding of the state transformer MVector a -> mBase (*, MVector a)
. Rather, the type signature above would be isomorphic to an immutable function over a pure state monad that purposefully duplicated the state MVector a -> mBase (MVector b, MVector a)
, i.e. the equivalent immutable "mapM" would be (a -> m b) -> MVector a -> mBase (MVector b, MVector a)
, with the parts after the first ->
representing the unwrapped state monad.
You wouldn't call the latter function mapM
, so I don't think it's suitable to call the former function mapM
.
Instead, I argue this naming:
mapM_ :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m ()
mapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m ()
cloneMapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m (MVector (PrimState m) b)
is more consistent with the existing convention on immutable structures.
should be some large red flag letting you know (in rust you would have to explicitly label that you are borrowing mutably, I don't think haskell should be in a worse off spot).
The large red flag with a mutate-in-place mapM
is the fact that you are passing in an explicitly-mutable data structure. In Haskell today, "mutable" is essentially just syntax sugar for something similar to s -> s
; various IO functions are not all suffixed with "InPlace".
map
in rust has a linear type signature e.g. pub fn map<U, F>(self, f: F) -> Option<U>
; in rust mutation is effectively syntax sugar for this sort of type signature, where the type system prevents a second consumption of the input after the output is evaluated. (Linear Haskell does this too.)
@infinity0 I want to re-emphasize my point, function that you are suggesting (note that I fixed a -> m a
):
mapM :: PrimMonad m => (a -> m a) -> MVector (PrimState m) a -> m ()
will not get into vector because there are already too many people who agree that it is a bad idea, including all vector
maintainers. It really deserves its own name, whatever that might be, we can bikeshed it, as we did in https://github.com/haskell/vector/issues/334#issuecomment-711348893, but it can't be mapM
Also note, rust and linear haskell have nothing to do with this PR because we are not Rust and linear haskell is faaaaaar away from being mainstream.
(note that I fixed a -> m a):
I simply copied and pasted what you wrote in your comment, above mine.
will not get into vector because there are already too many people who agree that it is a bad idea
I was just trying to change your minds by noting things that haven't been previously observed during the discussions. That is what discussions are for, for changing people's minds. I brought up rust in response to somebody else that brought up rust. I am not sure how you define "mainstream" if the current version of GHC 9 isn't "mainstream". Comparatively, it's made more progress than a lot of Haskell libraries, in quite an impressive short amount of time actually, so it's much closer than you think.
Of course if the answer is a simple "no", my only response is to predict that this discussion will look quite short-sighted when (not if) linear haskell does become standard haskell.
You did not just copy and paste. This is what I wrote in https://github.com/haskell/vector/pull/417#issuecomment-920065111:
mapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m (MVector (PrimState m) b)
This is what you wrote:
mapM :: PrimMonad m => (a -> m b) -> MVector (PrimState m) a -> m ()
Two very different things.
I was just trying to change your minds by noting things that haven't been previously observed during the discussions. That is what discussions are for, for changing people's minds.
This is totally fine, we all happy to discuss possibilities. I am definitely not opposed to discussing things, but if there is pushback from everyone else, there is no need to "re-emphasise".
if the current version of GHC 9 isn't "mainstream".
We barely dropped support for ghc-7.10 a month ago in #400 so ghc-9 is far away from minimal ghc version vector supports.
so it's much closer than you think
It is much closer to being adopted? Maybe, hopefully. But core packages that everyone depends on cannot switch to using all the new bells and whistles provided by the newest version of ghc, too many people depend on vector
being stable for a wide range of compiler version.
Why do you two keep quarreling about just types? Write codes:
import Data.Vector.Generic.Mutable as M
import Data.Vector.Generic as G
import Control.Monad.Primitive
M.mapM1, M.mapM2 :: (M.MVector v a, M.MVector v b, PrimMonad m) => (a -> m b) -> v (PrimState m) a -> m (v (PrimState m) b)
-- equivalent to `G.freeze . G.mapM f . G.thaw`
M.mapM1 f veca = do
veca' <- M.clone veca
vecb <- M.new (M.length veca)
forM_ [0..M.length veca - 1] $ \ !i ->
M.write vecb i =<< (f =<< M.read veca' i)
return vecb
-- equivalent to the intention of undefined behavior code `G.unsafeFreeze . G.mapM f . G.unsafeThaw`
M.mapM2 f veca = do
vecb <- M.new (M.length veca)
forM_ [0..M.length veca - 1] $ \ !i ->
M.write vecb i =<< (f =<< M.read veca i)
return vecb
M.mapInPlaceM f :: (M.MVector v a, PrimMonad m) => (a -> m a) -> v (PrimState m) a -> m ()
M.mapInPlaceM f vec = do
forM_ [0..M.length vec - 1] $ \ !i ->
M.write vec i =<< (f =<< M.read vec i)
I'm not really interested in isomorphism of types, but they seem to be very different functions. (By the way, if we implement mapM
to Mutable module, which do we choose from mapM1
or mapM2
as implementation? Or are we going to say the behavior is undefined if f
modifies the original vector?)
And we need to make sure we document the order of execution of M.mapInPlaceM f
, since otherwise we don't know what happens if f
modifies the original vector. Or we could declare undefined behavior.
I think the suffix is not so bad and will open the way to algorithms having two variants which is nice and good. Probably you can also have fusion rules that allow not more clones than is necessary by having both variants. Overall, if linear haskell becomes the default then I am sure vector would be happy at that time to get with the program. Until then, is the name so bad?
Of course if the answer is a simple "no", my only response is to predict that this discussion will look quite short-sighted when (not if) linear haskell does become standard haskell.
True, but I just hope all of us will live long enough to witness this glorious future. Shall we put this PR on hold and return to the discussion at that time?
@infinity0, it's not very long-sighted for purposes of a constructive discussion to brand maintainers short-sighted. If you are unhappy with our position, you can escalate this question to CLC.
Why do you two keep quarreling about just types? Write codes:
@gksato Function type should be enough in this case to understand what the function is doing.
For example your mapInPlaceM
will not type check and mapM1
has an unnecessary clone
, but that is implementation detail, which we don't care about in a conceptual discussion.
@Boarders I think fusion rules are our of window for mutable mapM*
True, but I just hope all of us will live long enough to witness this glorious future.
Love it :smile:
Shall we put this PR on hold and return to the discussion at that time?
:+1:
@lehins More than fair, if you are mutating then it is easy to figure out fusion by hand. Not that Haskell is another language (as the discussion has established) but I did hear a C++ programmer recently say they wish the algorithms
library had mutating and non-mutating variants for all algorithms that make sense. I don't see the disaster of offering both.
@lehins, Ohhhh... embarrassing. Firstly, I fixed my code. Secondly, I didn't even realize I didn't get to read through his/her discussion till now. I'm terribly sorry, @infinity0. You did understand the intention of mapM/mapInPlaceM
.
(a -> m a) -> v a -> m (v a)
and (a -> m a) -> Mutable v s a -> m ()
are not isomorphic, since no function of the type of the latter can change length. From that somehow I jumped to the conclusion that you (@infinity0) did not understand the function's intention. Let me apologize to you again.
(Turning my head to @lehins again)
By the way however, mapM1
and mapM2
are different functions, since they differ when f :: a -> m b
mutates the original vector. However it's something we should argue about when we are talking about minor corrections in PR about actual implementation of um... what I call mapM
. I didn't realize we were talking about whether we should accept this or not. I thought this should work as a tone-cooler, but it's not something that I should do when I'm weeping at the tone of the discussion. I'm terribly sorry, everyone here.
I'm terribly sorry, everyone here.
@gksato No worries, your input is always welcomed :+1:
I want to point out that (a -> m a) -> t a -> m (t a)
is not isomorphic to (a -> m a) -> Mutable t a -> m ()
. First function creates copy of data structure and t a
remains accessible while second one destroys it. They they will produce different results if function a -> m a
closes over original container.
With mutable data structure we have two different strategies: "create copy" and "update in place". Both are useful, but very different, so names have to make clear which strategy is used by function. Functions for pure data structures always create copy (except when opportunistic optimizations kick in). So it makes perfect sense to name functions that create copies of mutable containers the same way as functions that works with immutable and use different naming convention for functions that update in place
Linear types may give us update in place for pure data structures too. But this is new and experimental extension which I think will change as things are ironed out and we understand how to use them in larger haskell ecosystem. And vector is foundational library and by necessity is conservative.
Yes, I was taking some artistic license with the use of that term. However under linear haskell, (a -> m a) -> Mutable t a -> m ()
will be in fact isomorphic to a specialised version of a linear generalisation of (a -> m a) -> t a -> m (t a)
, and likewise for similar type signatures.
In that situation, having a separate naming convention is really unnecessary, directly analogous to naming fmapList :: (a -> b) -> [a] -> [b]
and equally as awkward.
So the main question is do we want to have these other duplicate names for 2-5 years and encourage people to write code like that, based on a naming convention that is a evolutionary dead-end in terms of technology, and not only that, it increases the pain for writing linear code later? My argument has been no.
But sure, if the consensus is "yes" then by all means somebody is very welcome to make the minimal changes to this PR and get it merged; I myself personally can wait a bit longer.
Um,... that's totally true if we take the viewpoint focused on t=Vector
; but when we see mapM
etc we're looking at the completely different structuret=MVector
. I don't think it's appropriate to see MVector
as Mutable Vector
here. An important point, I guess, is if Vector
and MVector
has both mapM
then the two structures should be interchangeable:
do v <- generateM 10 (return . id)
v2 <- mapM (return . (fromIntegral :: Int -> Word) . (*2)) v
val <- read v 2
val2 <- read v2 2
assert (val == 2)
assert (val2 * 2 == 4)
-- with this code, we don't need to decide we're using MVector or Vector
VU.read :: Applicative m => VU.Vector a -> Int -> m a
VU.read v i = pure $ v VU.! i
I don't think MVector
is an intermediate tool between Vector
s. And mapM
is already taken for the immutable usage by other immutable structures. So if we use mapM
against MVector
, MVector
must forget its mutability. In other words, MVector
is an array with an extra mutability. If immutable function is applied, it should behave immutably.
I have prattled on my harsh opinion so far, but as I writing this, I see one more important point now: if it is mapM f
, f
should be a -> m b
. It should not be restricted to a -> m a
.
And with Linear Haskell... yes, I think vector
would export
mapSameTypeM :: (PrimLinMonad m) => (a %1-> m a) -> MVector a %1-> m (MVector a)
and we would use mapInPlaceM
for implementation. However it can't take place of mapM
, since we can't fit a %1-> m b
in.
An important point, I guess, is if Vector and MVector has both mapM then the two structures should be interchangeable
Yes my suggestion breaks this, in favour of emphasising the relationship between mutable vs immutable fmap. If you think of MVector as a "handle" (like an STRef) and the Vector as the actual data, it makes sense that mapM
should perform stuff on the actual data in both cases, and it is surprising for mapM
to duplicate the handle (a non-data operation) as well as operating on the data.
Now you could argue that mapInPlace
could be decomposed to a map
coupled with something analogous to StateT
and runStateT
that converts between the mutable / immutable forms, so I can sympathise with the extra suffix along these lines. But the existing (read-only) functions are not named along these lines either - like how with a reference in a ST or State monad one would be expected to do something like read ref >>= foldM params
rather than foldM params ref
directly as is the case in the existing MVector module.
f
should bea -> m b
. It should not be restricted toa -> m a
.
I think this is more related to the underlying representation rather than the current mutable vs immutable discussion - if a
and b
had the same representation (runtime size) then it should be possible in principle to support a type-changing mutable mapM
too. So e.g. for boxed types this should be fine.
it should be possible in principle to support a type-changing mutable mapM too. So e.g. for boxed types this should be fine.
Unfortunately no, since an applied vector might be a slice. You can't change the constituent type of the first half of a vector while the second half staying the same.
However you're making more sense to me now (I now see your view point: MVector
not as a structure but as a mere handle. I wasn't able to get the viewpoint from the term "isomorphism" because we can't translate (a -> m a) -> Vector a -> m (Vector a)
back to (a -> m a) -> Mutable Vector (PrimState m) a -> m ()
). However... um... I find it difficult to settle my eyes at your view point.
Unfortunately no, since an applied vector might be a slice. You can't change the constituent type of the first half of a vector while the second half staying the same.
With linear types this would actually work as you couldn't keep a reference to both the outer slice and the inner slice at the same time. (You could split the outer slice into multiple pieces, then change the type of each piece, that would work.)
Re references & "isomorphism" etc, yes I can appreciate how this unification vision of mutable + immutable forms appears clumsy with the current state of things.
I want to stress that linear API is not going to replace non-linear API, it's going to exists alongside. And this PR expands nonlinear API so concerns about linear API are not terribly relevant to problem at hand
I suppose one issue here is non-deterministic monads, and perhaps the reason why this wasn't added in the first place - though I'd argue with non-deterministic monads, the caller would take responsibility for the order in which the modifications are applied, and that this PR is still helpful and safely predictable for deterministic monads (the more common use-case). I could add this caveat to the documentation if you want.
In any case, I think the documentation for
mapM_
and friends could be a bit more explicit about not modifying the vector - it's especially misleading that the input argument is of type(a -> m b)
and not(a -> m ())
.