haskell / core-libraries-committee

95 stars 16 forks source link

Restrict the type of `Data.List.NonEmpty.unzip` #86

Closed mixphix closed 1 year ago

mixphix commented 2 years ago

Currently we have unzip :: Functor f => f (a, b) -> (f a, f b) exported from Data.List.NonEmpty. The implementation can remain the same, but I propose that the type should be restricted to unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b) for analogy with Data.List as well as to avoid exporting overgeneralized functions from specific modules. Really, why should we be exporting something that works with any Functor from Data.List.NonEmpty?

treeowl commented 2 years ago

I would want to change the implementation as well, so as not to be gratuitously lazy (and memory leaky).

unzip ((a, b) :| abs) = (a :| as, b :| bs)
  where
    (as, bs) = Data.List.unzip abs
mixphix commented 2 years ago

Many of the other functions in that module are defined using lazy pattern matching ~(x :| xs). Why pick unzip to be more strict than its relatives?

treeowl commented 2 years ago

Many things in that module may be lazier than they should be. I haven't looked at it recently. The current unzip seems pretty awful, but maybe there should be a separate proposal to stricten everything up?

Bodigrim commented 2 years ago

Data.List.NonEmpty API is extremely lazy, but let's leave this for another proposal.

> Data.List.NonEmpty.map id undefined `seq` ()
()
mixphix commented 2 years ago

I'm hesitant to remove this function completely, though. It's already here and useful. I wonder if it would be better to export this function from Data.Functor? A Hackage search shows a few duplicates, indicating a level of utility.

ocharles commented 2 years ago

I opened https://github.com/haskell/core-libraries-committee/issues/88 today (completely independently of this - weird timing!). That proposal discusses having Data.Functor.unzip

mixphix commented 2 years ago

Haha, well in any case, they seem to be in tandem with each other :)

josephcsible commented 2 years ago

overgeneralized functions

I don't understand why this is a bad thing. With the exception of functions whose sole purpose is to restrict types (asTypeOf), why would you ever want a less general function when you could have an equivalent more general one instead?

mixphix commented 2 years ago

As @treeowl mentioned, strictness is one factor. Another is performance, though here it may not apply: for specific types one might be able to write it more efficiently than the general abstraction could perform, since the latter mustn't assume anything about the representation.

treeowl commented 2 years ago

Since I was being a bit vague above.... For [], say, we have

traverseBia :: Biapplicative f => (a -> f b c) -> [a] -> f [b] [c]
traverseBia _ [] = bipure [] []
traverseBia f (x : xs) = biliftA2 (:) (:) (f x) (traverseBia f xs)

In particular,

traverseBia :: (a -> (b, c)) -> [a] -> ([b], [c])

The Biapplicative instance for (,) is still a mess on Hackage (not matching the Bifunctor instance), but in the next release they'll both be lazy. One could also write a custom pair type with stricter versions. Anyway, we'll have

bipure = (,)
biliftA2 f g ~(a, b) ~(c, d) = (f a c, g b d)

Inlining,

traverseBia :: (a -> (b, c)) -> [a] -> ([b], [c])
traverseBia _ [] = ([], [])
traverseBia f (x : xs) = (fx1 : r1, fx2 : r2)
  where
    (fx1, fx2) = f x
    (r1, r2) = traverseBia f xs

Walking one of the result lists will cause the other one to be realized, with selector thunks for elements (assuming the optimizer doesn't do anything horrible). Memory leak averted, despite remaining very lazy. It'll still be quite a bit lazier than the hand-written unzip, but I think it's about the only reasonable way to write a lazy unzip for an arbitrary Traversable. Of course, if you want a totally eager unzip, you can do that too, by using a custom pair type with strict Bifunctor and Biapplicative instances.

treeowl commented 2 years ago

To sum up my position, I think Data.Functor, Data.Traversable, and Data.NonEmpty are all good places for unzip, each with a different implementation.

ocharles commented 2 years ago

FWIW, I would be fine to have all three.

Also,

why would you ever want a less general function when you could have an equivalent more general one instead?

More specific types can lead to more specific type errors.

phadej commented 2 years ago

I think this is a corollary of #22/#76/... When that is done, this could be done as well.

xplat commented 2 years ago

To sum up my position, I think Data.Functor, Data.Traversable, and Data.NonEmpty are all good places for unzip, each with a different implementation.

While I see the merits of each of these versions of unzip on their own, having them all seems like a namespacing nightmare.

tomjaguarpaw commented 2 years ago

I described the conditions under which I'm prepared to vote in favour of this breaking change on a related discussion

Bodigrim commented 2 years ago

Since this is a breaking change, an impact assessment is due. It's quite possible that someone exported Data.List.NonEmpty.unzip and used it with another Functor.

tomjaguarpaw commented 2 years ago

It's quite possible that someone exported Data.List.NonEmpty.unzip and used it with another Functor.

Here is a real world example in non-public code: https://github.com/haskell/core-libraries-committee/issues/88#issuecomment-1246483695

Bodigrim commented 1 year ago

@mixphix are you still interested in this? Could we make some progress towards impact assessment?

ocharles commented 1 year ago

I'd like to take this proposal over, if you're stuck/need help @mixphix. Is the next step still the impact assessment?

Bodigrim commented 1 year ago

Yes, impact assessment.

mixphix commented 1 year ago

If you wouldn't mind, @ocharles, I'd be grateful to hand this proposal over to you.

ocharles commented 1 year ago

You got it! I'll try and put together an impact assessment in the next few days

ocharles commented 1 year ago

If I did everything correctly, I believe I've completed the impact assessment. I used https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9757 to build a new version of GHC, then built clc-stackage without needing to patch any libraries*.

So, to reiterate: the proposed change doesn't seem to break anything in clc-stackage. I'd like to submit the proposal (precisely what's in the above MR) to the committee.

Bodigrim commented 1 year ago

Thanks, @ocharles. I'll trigger the vote once a new cast of CLC is selected in February.

ocharles commented 1 year ago

@Bodigrim Good to go with a vote now?

Bodigrim commented 1 year ago

Dear CLC members, the proposal is to restrict Data.List.NonEmpty.unzip from unzip :: Functor f => f (a, b) -> (f a, f b) to a more monomorphic unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b). The impact assessment found no Stackage packages affected by this change.

Strategically the proposed change is in line with the monomorphisation of Data.List, approved earlier.

CLC has approved addition of polymorphic unzip to Data.Functor in #88, but this change happened to late to be included in base-4.18 to be shipped with GHC 9.6.

Before we trigger a vote, I would like to hear (non-binding) opinions from new members of CLC. CC @chshersh @angerman @parsonsmatt @hasufell

tomjaguarpaw commented 1 year ago

It's a gratuitous breaking change, so -1 from me, as outlined in a previous comment.

[EDIT: oh sorry, my opinion wasn't actually requested at this time, but FWIW that's my opinion, and my vote]

Bodigrim commented 1 year ago

What worries me in the current plan is that if anywhere out there exists a consumer of polymorphic unzip from Data.List.NonEmpty, they won't be able to maintain backward compatibility without CPP, because Data.Functor.unzip has not been in base for long enough (in fact, it is not in base yet ;).

https://github.com/haskell/core-libraries-committee/issues/88#issuecomment-1247093730 describes a possible approach to gradual replacement, but I think inroducing Data.List.NonEmpty.unzipNonEmpty only to deprecate it later is too grotesque for the task at hand.

If there is an appetite for a smoother transition, I'd suggest the following:

This is slightly antithetical to the purpose of the proposal, because until GHC 9.10 users would not have access to a monomorphic unzip, but this is no worse than status quo.

tomjaguarpaw commented 1 year ago

What does someone do if they want to write code using polymorphic unzip that works across three versions, 9.4 to 9.8, let alone back to 8.10 (which is still widely used in practice)? I don't see any strong reason to force monomorphization of unzip. I suggest we just keep it there with a deprecation warning. It's not doing anyone any harm.

angerman commented 1 year ago

Reading through this thread, I find the justification for this change rather weak. It also comes with not deprecation as far as I can see. This thread got me wondering how a proper deprecation warning for this might look. We really only want to inform users of the "overgeneralised" use case, not those who use the function in its proposed monomorphised form. Lacking a better (usage dependent) deprecation, I'd argue that this should first be implemented with a regular deprecation warning, and the suggestion on how to migrate, before being restricted. Again ideally we'd have some deprecation based on actual use.

JakobBruenker commented 1 year ago

With https://github.com/ghc-proposals/ghc-proposals/pull/575, you could have use-dependent deprecation:

unzip :: UnzipDeprecation f => Functor f => f (a,b) -> (f a, f b)
unzip xs = (fst <$> xs, snd <$> xs)

class UnzipDeprecation f

instance UnzipDeprecation f {-# DEPRECATED "..." #-}

instance {-# OVERLAPPING #-} UnzipDeprecation NonEmpty

Edit: However, see Oleg's comment here and Tom's followup https://github.com/ghc-proposals/ghc-proposals/pull/575#issuecomment-1435741892 i.e. this wouldn't actually be backwards compatible, though if one uses INCOHERENT instead of OVERLAPPING it might be

angerman commented 1 year ago

Ok! That is some creative idea!

treeowl commented 1 year ago

I think the main justification is that we don't really want that implementation.

hasufell commented 1 year ago

Data.List.NonEmpty API is extremely lazy, but let's leave this for another proposal.

> Data.List.NonEmpty.map id undefined `seq` ()
()

My point is that this should not be left for another proposal.

The only interesting reason to change the type signature is to have a more specialized implementation. That would be a real motivation.

Without that, it's not enough for me and will likely get a -1.

If someone comes up with an interesting implementation that we are excited about, then changing the type signature becomes feasible. Doing it prematurely is a no-go for me and is just a cosmetic change.

So I kinda agree with @treeowl here. In case we do want a different implementation, I'd possibly argue for renaming the function and never renaming it back... completely removing Data.List.NonEmpty.unzip forever. But I could possibly be persuaded otherwise wrt deprecation strategy.

Bodigrim commented 1 year ago

My point is that this should not be left for another proposal.

The extreme laziness of Data.List.NonEmpty is tracked in #107.

hasufell commented 1 year ago

My point is that this should not be left for another proposal.

The extreme laziness of Data.List.NonEmpty is tracked in #107.

Yes. My point is that my +1 on function signature change would be dependent on whether we accept a concrete implementation change. And that can be done as part of that ticket.

Ericson2314 commented 1 year ago

I would suggest Data.Functor.unzip be backported to older compilers, which could help with the depreciation/migration story.

chshersh commented 1 year ago

This is a tough one 🥴

We (CLC) need to come up with a policy for introducing breaking changes. It's not the first proposal that wants to change existing things. And it's not going to be the last. Our lives would become much easier if we provide a prepared answer to the question "What's the deprecation schedule and timeframe for any breaking change?"

We don't have such a helpful document now, so I'll provide my thoughts on this proposal alone.

This proposal as is

I'm against this proposal as is because it's a breaking change without any deprecation considerations. Sure, the CLC Stackage impact assessment didn't indicate any breakages. But Stackage is not all Haskell.

Yet, still

It's a good change. It's bizarre that Data.List.NonEmpty has a single polymorphic function. Besides, as discussed before, making the type more specific enabled writing a more optimal implementation.

In the context of already accepted and implemented #22/#76, making a fuss about making unzip monomorphic sounds unreasonable.

How to deprecate?

I would accept this proposal with a deprecation schedule. Sure, it would be cool to have fancy deprecation tools to drive the change. But we don't have anything atm. So my alternative to what @Bodigrim suggested:

This schedule ensures enough time to adjust to the change and allows supporting the three latest major GHC versions without CPP.

Alternative

We could become super conservative and just deprecate Data.List.NonEmpty.unzip without ever touching it. Honestly, I really doubt that this function is used in performance-critical code (or any code at all, tbh). So there's no real benefit in changing it, we could just make Haskell more stable and reliable by not changing this function at all. I don't think that having a single deprecated function in Data.List.NonEmpty forever is that big of a deal.

But Haskell is always going to change things, and I'm not prepared to push for extreme stability right now.

parsonsmatt commented 1 year ago

I feel like unzip :: (Functor f) => f (a, b) -> (f a, f b) is a "bad operation" in the same way that List.lookup is a "bad operation" - the default implementation is definitely inefficient, requiring two passes over the input, and it's probably not what you want.

List.unzip and NonEmpty.unzip make sense - they can be done with a single pass over the input list, and it's somewhat common to zip and unzip on these structures. But I wouldn't want to unzip a Map k (a, b), or unzip an IO (a, b), or unzip a Maybe (a, b).

So, I think I am mildly -1 on Data.Functor.unzip - if we want to zip/unzip as a polymorphic abstraction, I'd rather see a class Zip that managers that, rather than bolting the functionality on to Functor, where it will always be pessimal.

For the same reason, I'm +1 on Data.List.NonEmpty.unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b).

Given that the clc-stackage didn't find any breakage, and that this is a polymorphic function exposed from a module designed to be imported qualified, I'd expect most uses of this to be NonEmpty.unzip or similar. I'd be fine with a faster migration path - possibly even a single release, 9.8 with a WARNING notice ("this is going to be monorphized in the future"), and then 9.10 monomorphizing it.

Bodigrim commented 1 year ago

@ocharles since you took over the proposal, what would you like to do? We can vote on the original proposal, but judging from the discussion above this is likely to fail. I think that changing the proposal to a warning now and monomorphisation later (say, in GHC 9.14) has much better chances to succeed.

ocharles commented 1 year ago

I would like to vote with the original proposal. Personally with my stackage evaluation I don't see any significant breakage, but the CLC are of course free to see things differently. I don't think I have the resources to do much more than that though, and took this from @mixphix to try and keep things moving forward

mixphix commented 1 year ago

I'm obviously in favour of the proposal as written; preferably with @parsonsmatt's fast-tracked warning/restriction two-release plan, but I'll gladly accept any schedule for the sake of a more sensible type.

ocharles commented 1 year ago

@Bodigrim could you remove "awaits-impact-assessment", as that has been done (https://github.com/haskell/core-libraries-committee/issues/86#issuecomment-1397145338)

Bodigrim commented 1 year ago

I'll gladly accept any schedule for the sake of a more sensible type.

I'm going to use your permission and put for a vote a plan from https://github.com/haskell/core-libraries-committee/issues/86#issuecomment-1435993821 as it's likely to gain more support.

Dear CLC members, let's vote on the following plan:

  1. GHC 9.8: deprecate Data.List.NonEmpty.unzip with {-# DEPRECATED unzip "This function will be made monomorphic in GHC 9.14, consider switching to Data.Functor.unzip" #-}. The latter is already there.
  2. GHC 9.10: no changes.
  3. GHC 9.12: no changes.
  4. GHC 9.14: change type signature of Data.List.NonEmpty.unzip to monomorphic unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b) and remove the deprecation warning.

@tomjaguarpaw @chshersh @hasufell @mixphix @angerman @parsonsmatt


+1 from me. This is a very conservative migration plan for a virtually unused function, eventually bringing in more consistency.

parsonsmatt commented 1 year ago

+1 - would also +1 a more aggressive schedule

ocharles commented 1 year ago

I don't quite understand point 1. If a user of Data.List.NonEmpty.unzip uses it on a NonEmptyList they shouldnt use Data.Functor.unzip. Should they just disable the depreciation warning? If so, I think we should say that. I say this because I would expect most users are using it on NonEmptyLists, and the more general type is the exception. The impact assessment seems to corroborate this.

I'm ok with this proposal, but it doesn't sit great because legitimate uses will have to turn the warning off, but that's (at least) module wide, and while you get a warning about the deprecation, you get no such warning to stop ignoring deprecations when you reach 9.14.

hasufell commented 1 year ago

As there is no suggestion for an implementation change, I find the motivation insufficient and vote -1.

chshersh commented 1 year ago

+1


The deprecation schedule indeed may look too conservative. But I think it would be huge success if we could enforce and follow the same schedule for every breaking change.

I also wouldn't block this improvement on the grounds of "not doing enough". If someone in the future wants to create a proposal to change the implementation of unzip for NonEmpty, they're more than welcome to do so 😌

tomjaguarpaw commented 1 year ago

+1


However, I really think we should also add Data.List.NonEmpty.unzipNonEmpty with a specialized implementation at the same time, and then later do Data.List.NonEmpty.unzip = unzipNonEmpty.

I'm sympathetic to @hasufell's point of view, but I still think that it's better to bring it to the attention of users that Data.List.NonEmpty.unzip is currently not a specialized implementation.

mixphix commented 1 year ago

The proposal to strengthen to unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b) will not specialize the implementation in any way. There are other places for similar unzip functions to live that may have different implementations, but are functionally the same when specialized to NonEmpty. The names can happily coexist.