haskell / core-libraries-committee

95 stars 16 forks source link

Add `Data.Functor.unzip` #88

Closed ocharles closed 1 year ago

ocharles commented 2 years ago

Currently Data.List.NonEmpty defines:

-- | The 'unzip' function is the inverse of the 'zip' function.
unzip :: Functor f => f (a,b) -> (f a, f b)
unzip xs = (fst <$> xs, snd <$> xs)

But this is a very strange place for such a generally useful function.

I'd like to propose that Data.Functor gains this function. I don't feel strongly on whether it should be a new definition, nor what should happen with Data.List.NonEmpty.unzip, only that after this proposal I can do:

import Data.Functor (unzip)
treeowl commented 2 years ago

The version in Data.Functor can be no better than that one. Data.Traversable can do rather more, in principle, using something like Data.Biapplicative.traverseBia.

ocharles commented 2 years ago

Ok, I am fine with the version I proposed. Are you saying it is not suitable for Data.Functor?

treeowl commented 2 years ago

I'm just saying that there are more and more options the more is known about the structure. The bare minimum version would fit very well in Data.Functor, but I don't think I'd personally find it very useful.

ekmett commented 2 years ago

It is also worth noting the existence of munzip in Control.Monad.Zip, which is almost always the same as the functor-based definition. Except it uses liftM internally and inefficiently for historical reasons. (I was able to come up with a couple of monads for which it actually helped to have it in the class, but they are decidedly weird.) I'm not calling for concrete action around this fact, just raising awareness.

phadej commented 2 years ago

unzip could be a member of Functor type-class, for the same reason (<$) is, there can be more efficient implementations: [a] and NonEmpty a come to mind immediately.

And as there is default implementation, it shouldn't be too intrusive change either.

Ericson2314 commented 2 years ago

I would move the function, and only then worry about making it a class member. Baby steps.

treeowl commented 2 years ago

@phadej, I'm not sure making it a method magically solves the problems. The strictness and other properties of the most "obvious" unzip depend on the type, which makes a class method hard to reason about. Furthermore, one may want less-obvious unzips in certain cases. Ignoring the strictness of the outermost pair (because it's boring), there's just one unzip that makes sense for every Functor, which is also boring. There are two additional ones that make sense for every Traversable. Some types have sensible unzips that are none of these, such as lists (but, perhaps surprisingly, not Data.Sequence sequences). Note that while the lazy Traversable unzip has less space leak potential than the Functor one, it will actually use significantly more memory in certain call site contexts.

phadej commented 2 years ago

@treeowl it doesn't need to solve all the problems. Is there drawbacks of unzip being a class member?

No ordinary person will be able to choose from three+ different unzips for lists.

treeowl commented 2 years ago

@phadej, I'm concerned for the poor programmer who has to implement the method. From a practical perspective, I think most applications/call sites will want one of three things.

  1. Totally eager unzipping. This is natural for arrays, and might make sense for other things if they're sufficiently short.
  2. Something lazier than that, but designed to avoid leaks.

Another concern: I don't actually think non-traversable functors necessarily have sensible unzips. For example, you can use the simple fmap thing to "unzip" an IO action, but that doesn't strike me as something very reasonable to do.

phadej commented 2 years ago

@treeowl

but that doesn't strike me as something very reasonable to do.

So are you against adding general

unzip :: Functor f => f (a,b) -> (f a, f b)
unzip xs = (fst <$> xs, snd <$> xs)

as it can be used with IO?

phadej commented 2 years ago
unzipWith :: (c -> (a, b)) -> f  c -> (f a, f b)

Is slightly more general, and composes better. (Think a variant for Compose).

ocharles commented 2 years ago

Perhaps it would help for me to supply a bit more motivation for why I want this? Here is a use of Data.List.NonEmpty.unzip in my current work project (none of which are used on NonEmpty lists):

  let (onFailedToSaveImage, onSuccessfullySavedImage) =
        unzip $
          inputs.onSavedImage <&> \e ->
            split $
              e <&> \(image, res) -> do
                case res of
                  Left e -> Left e
                  Right () -> Right image

Where the thing being unziped here (inputs.onSavedImage ..) is :: Cameras (Event SomeException, Event ()). Cameras is a two-tuple as a functor, and Event is from reactive-banana.

This isn't much, but it felt wrong to import Data.List.NonEmpty for this, and I didn't want to depend on any other libraries or have to define unzip to make this code work.

treeowl commented 2 years ago

@phadej I don't think I'm against that... but I have doubts about a class method that doesn't make much sense for a pretty large category of instances. It makes me nervous, anyway. I'd be substantially less nervous about adding one for Traversable.

treeowl commented 2 years ago

But Traversable isn't quite right either, is it, because that requires an ordering. Perhaps morally it belongs in some class between? Or does unzipping IO make more sense than I'm giving it credit for?

tomjaguarpaw commented 2 years ago

Cameras is a two-tuple as a functor

Do you mean type Camera a = (Something, a) and you are using unzip :: (a, (b, c)) -> ((a, b), (a, c))? Otherwise I don't understand what you mean by "two-tuple as a functor".

I would like to see more evidence that unzip is used at general Functor types before considering this for inclusion in base (notwithstanding that the general version shouldn't have been added to Data.List.NonEmpty in the first place).

ocharles commented 2 years ago

Do you mean type Camera a = (Something, a) and you are using unzip :: (a, (b, c)) -> ((a, b), (a, c))? Otherwise I don't understand what you mean by "two-tuple as a functor".

No, sorry. I mean:

data Cameras a = Cameras{ upCamera, downCamera :: a }
tomjaguarpaw commented 2 years ago

I see, so you are using unzip at a type isomorphic to unzip :: ((a, b), (a, b)) -> ((a, a), (b, b)). This fits with @treeowl's hypothesis that it is sensible to require at least Traversable. Perhaps it should be Traversable and Applicative. Then it's a sort of "transpose" operation, I think.

ocharles commented 2 years ago

Other definitions of this function:

tomjaguarpaw commented 2 years ago

Thanks! Definitions are somewhat persuasive. Actual usages (particularly at the fully-general type) would be even more persuasive.

alexfmpe commented 2 years ago

unzipWith :: (c -> (a, b)) -> f c -> (f a, f b) Is slightly more general, and composes better. (Think a variant for Compose).

More general in that it can be specialized to unzip = unzipWith id, but actually more specific when it comes to available instances. There are types that can be unzipped but not mapped over. For instance, if there are invariants on the values or on the relationship between them.

data Monotonic a = Ord a => Monotonic (Vector a)

monotonic :: Vector a -> Maybe (Monotonic a)
monotonic = ...

unzip :: (Ord a, Ord b) => Monotonic (a,b) -> (Monotonic a, Monotonic b)
unzip (Monotonic a) = bimap Monotonic Monotonic (Vector.unzip a)

instance TypeError ('Text "Cannot perform arbitrary point-wise operations") => Functor Monotonic where
  fmap = error "Impossible"
alexfmpe commented 2 years ago

Ah hmm, that's actually a wrong example since Ord (a,b) allows for the b side to be non-monotonic. Maybe a better example would be something like data RNF a = NFData a => RNF a. The idea being that a fully evaluated pair must have fully evaluated components, and we can't map over it via Functor for the same reasons as Set.

phadej commented 2 years ago

@alexfmpe Sure. But as you mention yourself, these types are not Functor. So unzip x = (fst <$> x, snd <$> x) default implementation wouldn't work for them. And having Unzip f as a superclass of Functor won't fly, as it breaks almost everything.

treeowl commented 2 years ago

There is one potential complication with making it a Traversable method: the default lazy implementation can't be written in Report Haskell without unsafeCoerce—it seems to need either that or GADTs. The default fully eager implementation is likely only desirable in most circumstances for rigid-spined types (maps, hash maps, arrays, and such).

phadej commented 2 years ago

@treeowl which Traversable default lazy implementation. You never wrote it down in this issue (nor described how it's different (& better?) than unzip x = (fst <$> x, snd <$> x)

treeowl commented 2 years ago

@phadej ,

unzipWithLazy :: Traversable t => (a -> (b, c)) -> t a -> (t b, t c)
unzipWithLazy = Data.Biapplicative.traverseBia

The above assumes the Biapplicative (,) instance in the master branch of bifunctors. In the current Hackage version you'd need a bit of newtype trickery to get the right behavior.

The advantage over the simple fmap thing: for lazy-spined types, the two results are built "in parallel", (likely) avoiding leaking first components into second components or vice versa.

phadej commented 2 years ago

@treeowl is that the same as

-- | 'unzip' transforms a list of pairs into a list of first components
-- and a list of second components.
--
-- >>> unzip []
-- ([],[])
-- >>> unzip [(1, 'a'), (2, 'b')]
-- ([1,2],"ab")
unzip    :: [(a,b)] -> ([a],[b])
{-# INLINE unzip #-}
-- Inline so that fusion `foldr` has an opportunity to fire.
-- See Note [Inline @unzipN@ functions] in GHC/OldList.hs.
unzip    =  foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])

when specialized t = []? It's difficult to understand what traverseBia does, and in fact bifunctors tries to rewrite it (and bunch of other specializations respectively) to

-- Rewrite rules for traversing a few important types. These avoid the overhead
-- of allocating and matching on a Mag.
{-# RULES
"traverseBia/list" forall f t. traverseBia f t = traverseBiaList f t
#-}
traverseBiaList :: Biapplicative p => (a -> p b c) -> [a] -> p [b] [c]
traverseBiaList f = foldr go (bipure [] [])
  where
    go x r = biliftA2 (:) (:) (f x) r

That looks like specialized unzip from Data.OldList, doesn't it?

I fail to grasp a benefit of having unzipWithLazy = Data.Biapplicative.traverseBia over unzipWith as a member of Functor. There aren't that many lazy-spined types, that special implementations for them cannot be written by hand, without overly-complicating things.


All this discussion just makes me think: let's just add what @ocharles proposed, and let you & me discuss options as our own proposals if we think if just having unzip in Data.Functor (as it's already in Data.List.NonEmpty) is not good enough.

The original proposal is very simple. I suggest @ocharles to trigger a vote.

ekmett commented 2 years ago

I don't actually think non-traversable functors necessarily have sensible unzips.

unzip :: (c -> (a,b)) -> (c -> a, c -> b)

seems perfectly sensible (even canonical) to me, despite (->) c not being traversable.

treeowl commented 2 years ago

@phadej , it's not quite the same, no. The traverseBia one is lazier; unzip [undefined, undefined] will produce ([undefined, undefined], [undefined, undefined]) rather than undefined. I believe that semantically we have

unzipUsingTraverse xs = xs `seq` unzipUsingFmap xs

but they're operationally different. I agree this discussion should be continued elsewhere.

ekmett commented 2 years ago

Similarly, there are lots of other representable functors that are not traversable that have sensible unzips:

Moore c (a,b) -> (Moore c a, Moore c b)
Mealy c (a, b) -> (Mealy c a, Mealy c b)

etc.

Then there are compositions of traversable things and representable things that are neither, which have sensible unzips, etc. etc. etc.

Functor captures that sort of broader space. Admittedly if you move an overloadable method into the Functor class you are making everyone pay for it and hoisting users on the horns of a dilemma of "do I increase the requirements on my Functor instance or do I accept a suboptimal 'unzip'?"

treeowl commented 2 years ago

@ekmett , I didn't think about the extra constraint issue. That's pretty nasty.

phadej commented 2 years ago

There's also Unzip (Event t) instance in reflex, with unzip = splitE: https://hackage.haskell.org/package/reflex-0.8.2.1/docs/Reflex-Class.html#v:splitE

And that's actually the only use-case why I'm not removing Unzip class from semialign (or restructuring the hierarchy there, semialign works now, and there is very little motivation to change it).

phadej commented 2 years ago

Btw. Prelude exports unzip :: [(a,b)] -> ([a], [b]). If unzip is added to Functor, it would make sense to export the Functor version instead, but if it's just defined as a unzip x = (fst <$> x, snd <$> x) then the Prelude version should stay list-specific (a pity).

ekmett commented 2 years ago

This sounds like this should be funzip or something to avoid clash on unqualified import of Data.Functor unless you move it into the class where it can be overridden to suit.

phadej commented 2 years ago

There are indeed plenty of unqualified Data.Functor imports: https://hackage-search.serokell.io/?q=import%5Cs*Data.Functor%24

Hard to say how many of them use unzip though.

phadej commented 2 years ago

If Data.OldList unzip is changed as

-let unzip =  foldr (\  (a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])
+let unzip =  foldr (\ ~(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])

(which would be own issue)

Then

>>> bimap length length $ unzip [undefined, undefined]
(2,2)

it would be as lazy as traverseBia variant.

I'm not competent enough to comment what are implications of that change.

mixphix commented 2 years ago

@ocharles, if you will be triggering a vote, shall we collaborate on an MR to close #86 simultaneously? I find it hard to imagine there will be people in favour of one but not the other.

ekmett commented 2 years ago

There's also Unzip (Event t) instance in reflex, with unzip = splitE: https://hackage.haskell.org/package/reflex-0.8.2.1/docs/Reflex-Class.html#v:splitE

Another example is Lindsey Kuper's LVish, where the monad would admit an unzip that creates a "future" for the computation, and returns two different computations that each force the future and then apply fst or snd to the result.

tomjaguarpaw commented 2 years ago

@mixphix @ocharles As a point of information, in case it changes the design you come up with, I give fair warning that I will vote against any proposal that breaks existing consumers of Data.List.NonEmpty.unzip without a deprecation cycle. One such cycle (and the one I wrote about at Opaleye's API breakage policy) is

  1. GHC version n:
    • Add Data.List.NonEmpty.unzipNonEmpty, specialised to the NonEmpty type.
    • Add Data.Functor.unzip at general type
    • Change the documentation of Data.List.NonEmpty.unzip to say that it will be deprecated in GHC version n+1 and to use Data.List.NonEmpty.unzipNonEmpty or Data.Functor.unzip
  2. GHC version n+1:
    • {-# DEPRECATE #-} Data.List.NonEmpty.unzip
  3. GHC version n+2:
    • Either remove Data.List.NonEmpty.unzip
    • or make it a synonym of Data.List.NonEmpty.unzipNonEmpty (in which case document Data.List.NonEmpty.unzipNonEmpty as obsolete and that Data.List.NonEmpty.unzip should be preferred instead)

I am open to alternative forms of deprecation cycle but we need something that ensures users get a deprecation warning before their code breaks. Is this very heavyweight? Yes. Is this very annoying for those of us who want to move fast and fix things? Also yes. Is it essential if we want Haskell to be taken even minimally seriously in the backwards compatibility stakes? Even more yes.

(Procedural note: I don't believe the CLC can actually vote for 1, 2 and 3 at the same time in practice. I suspect we'd only be able to vote on 1, and then later 2, and then later still 3.)

phadej commented 2 years ago

I find that deprecation cycle worse than e.g. Data.List monomorphisation process.

The Data.List monomorphization way of "Just make patches for everything which breaks (in Stackage)" is considerably less polluting, and in this case shouldn't even need much work. I'm sure @mixphix is happy to gauge the impact.

ocharles commented 2 years ago

I haven't given you a proposal yet, but maybe it's time for me to get to that. The TL;DR of my proposal would just be to add unzip to Data.Functor though. It's a purely additive change

phadej commented 2 years ago

@ocharles Sorry for playing a devils advocate.

See comments https://github.com/haskell/core-libraries-committee/issues/88#issuecomment-1247050108 and https://github.com/haskell/core-libraries-committee/issues/88#issuecomment-1247054109

There is a chance of a name clash. Probably rare, but given that Data.Functor is likely imported unqualified, I'd say it should be assessed. (Do you hide unzip from Prelude in your work project?)

ocharles commented 2 years ago

Do you hide unzip from Prelude in your work project?

Yes. Perhaps more extremely, I am explicit with my imports (never unqualified), and when I have to hide things from Prelude I actually go as far as NoImplicitPrelude and import everything I need (that is, I don't use hiding and I don't use unqualified imports. Thank you HLS!)

There is a chance of a name clash. [..] I'd say it should be assessed.

How does one do such an assessment? Should I make the change and rebuild Hackage?

phadej commented 2 years ago

@ocharles

How does one do such an assessment? Should I make the change and rebuild Hackage?

s/Hackage/Stackage/

See https://github.com/Bodigrim/clc-stackage That is for GHC-9.2. So yes, build a GHC-9,2 with your change, and then build that project. It should work with cabal build -w /path/to/your/ghc/_build/stage1/bin/ghc.

tomjaguarpaw commented 2 years ago

I find that deprecation cycle worse than e.g. Data.List monomorphisation process.

You're welcome to find it as you like. I'm just making my position clear in advance to the proposers to avoid unwelcome surprises.

Bodigrim commented 2 years ago

@ocharles while it is your right as a proposer, I'd strongly suggest against triggering a vote after two days only. However, you are not obliged or expected to react to all suggestions in the thread. Once ready, please prepare a GHC MR with desired changes.

I usually do not require impact analysis for additive changes, but it strengthens the proposal and eliminates doubts. Building https://github.com/Bodigrim/clc-stackage takes a day or two on my laptop, but besides that is a fairly automatic process. See instructions inside the cabal file.

alexfmpe commented 2 years ago

@phadej

But as you mention yourself, these types are not Functor. So unzip x = (fst <$> x, snd <$> x) default implementation wouldn't work for them. And having Unzip f as a superclass of Functor won't fly, as it breaks almost everything.

Right. I guess what I was trying to get at is that neither fmap nor unzip are stronger than each other per-se (at least if we take invariants into account), but unzipWith :: (c -> (a, b)) -> f c -> (f a, f b) allows both

  fmap f = fst . unzipWith (dup . f)
  unzip = unzipWith id

though whether that fmap is lawful depends on the laws we add to unzipWith. I assume unzip would trivially be lawful.

Now, a default implementation of unzip in terms of fmap does allow for

unzipWith f = unzip . fmap f

so in the specific case of functor-based unzipping, it seems going with unzip or unzipWith is equivalent.

treeowl commented 2 years ago

For a Functor class method (which I'm nervous about), unzipWith seems rather better. For a Traversable method (less nervous), one option is

unzipUsing :: Applicative f => (a -> f (b, c)) -> t a -> f (t b, t c)

A less conservative thing would be to move Biapplicative to base and make traverseBia a Traversable method.

Bodigrim commented 1 year ago

@ocharles if you made you mind, please raise a draft MR, so that we know what exactly we are voting on.

ocharles commented 1 year ago

@Bodigrim Will do, thanks for the nudge.

Edit: to clarify, do you mean raising a draft MR against base with my proposed change?

ocharles commented 1 year ago

@Bodigrim Done: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9434