Make NonEmpty functions less gratuitously lazy

treeowl commented 1 year ago

This is going to be ... hard. Some decisions, I think, will not be very controversial. Others will likely be quite controversial. Let's go through them one by one and try to figure things out. But first, I'd like to mention that along with the definition we have,

data NonEmpty a = a :| [a]

there's another, equally valid, expression of the concept of a nonempty list:

data NonEmpty' a = EndNE a | ConsNE a (NonEmpty' a)

Each of these expressions is better at certain things and worse at certain things. Personally, I find the NonEmpty' expression more natural or fundamental, and therefore I will tend towards implementations that reflect the "natural" strictness we'd find there. However, I don't want to push for that where it feels unnatural.

`unfold`

This function was deprecated ages and ages ago, and the time has long since come to delete it. There's no point discussing its strictness.

`uncons`

Currently,

uncons :: NonEmpty a -> (a, Maybe (NonEmpty a))
uncons ~(a :| as) = (a, nonEmpty as)

I propose

uncons (a :| as) = (a, nonEmpty as)

What I actually want, based on my stated bias, is

uncons :: NonEmpty a -> Either a (a, NonEmpty a)
uncons (a :| []) = Left a
uncons (a :| b : bs) = Right (a, b :| bs)

Would that be a step too far? If so, would it be worth offering such a function by another name?

`init` and `last`

These have irrefutable patterns, but they're actually strict. Confusing, but just an implementation issue we don't need to discuss.

`<|` and `cons`

This is defined

a <| ~(b :| bs) = a :| b : bs

I believe this is the correct amount of laziness and we should leave it as is.

`toList`

Currently,

toList ~(a :| as) = a : as

I propose

toList (a :| as) = a : as

`lift`

Currently,

lift :: Foldable f => ([a] -> [b]) -> f a -> NonEmpty a
lift f = fromList . f . Foldable.toList

I don't have much intuition about how this function should behave. If we change toList, then it will automatically get stricter when applied to NonEmptys; is that okay?

`map`

Currently,

map f ~(a :| as) = f a :| fmap f as

I propose to remove the irrefutable pattern.

`inits`, `inits1`, `tails`, and `tails1`

I have yet to form any opinion on these. The change in toList behavior affects them too.

`insert`

Currently,

insert  :: (Foldable f, Ord a) => a -> f a -> NonEmpty a
insert a = fromList . List.insert a . Foldable.toList

I think the proposed change in toList behavior is fine for this. It might make a difference for some sort of degenerate Ord instance, but I don't imagine we'll get any complaints.

`scanl`, `scanr`, `scanl1`, `scanr1`

Currently, these are lazy; I propose to make them strict.

`intersperse`

Currently,

intersperse :: a -> NonEmpty a -> NonEmpty a
intersperse a ~(b :| bs) = b :| case bs of
    [] -> []
    _ -> a : List.intersperse a bs

I'd definitely remove the irrefutable pattern. My bias would suggest forcing bs as well, but I doubt that's what people will actually want.

`reverse`

Currently, reverse is actually strict, if I read it right. I'm fine with leaving it that way.

`take`

Currently,

take :: Int -> NonEmpty a -> [a]
take n = List.take n . toList

The proposed change to toList would make this strict, which I think would be better.

`drop`

Currently,

drop :: Int -> NonEmpty a -> [a]
drop n = List.drop n . toList

which is lazy when n <= 0 and strict otherwise. The proposed change to toList would make it unconditionally strict, which I think would be better.

`splitAt`

Currently,

splitAt :: Int -> NonEmpty a -> ([a],[a])
splitAt n = List.splitAt n . toList

This is odd the same way drop is. The proposed change to toList will make it strict.

`takeWhile`, `dropWhile`

There's a pattern here; I think these are better with stricter toList.

`zip`, `zipWith`

Currently,

zipWith :: (a -> b -> c) -> NonEmpty a -> NonEmpty b -> NonEmpty c
zipWith f ~(x :| xs) ~(y :| ys) = f x y :| List.zipWith f xs ys

I would remove the irrefutable patterns.

`unzip`

Currently,

unzip :: Functor f => f (a,b) -> (f a, f b)
unzip xs = (fst <$> xs, snd <$> xs)

I propose changing this to

unzip :: NonEmpty (a, b) -> (NonEmpty a, NonEmpty b)
unzip ((a, b) :| abs) = (a :| as, b :| bs)
  where
    (as, bs) = List.unzip abs

Again, my bias would suggest forcing abs, but I don't think that's what people will want.

`transpose`

No clear opinion.

`append`

See <> below.

`appendList`

This is strict, and I think should remain so.

`prependList`

This is strict, and I think should remain so.

`Foldable` instance

We have

  foldr f z ~(a :| as) = f a (List.foldr f z as)
  foldl f z (a :| as) = List.foldl f (f z a) as
  foldl1 f (a :| as) = List.foldl f a as
  foldr1 f (p :| ps) = foldr go id ps p
    where
      go x r prev = f prev (r x)
  foldMap f ~(a :| as) = f a `mappend` foldMap f as
  fold ~(m :| ms) = m `mappend` fold ms
  toList ~(a :| as) = a : as

I propose to remove all the irrefutable patterns.

`Functor` instance

Currently,

fmap f ~(a :| as) = f a :| fmap f as
b <$ ~(_ :| as)   = b   :| (b <$ as)

I propose to remove the irrefutable patterns.

`Traversable` instance

Currently,

  traverse f ~(a :| as) = liftA2 (:|) (f a) (traverse f as)

I propose to remove the irrefutable pattern.

`Semigroup` instance

We currently have

(a :| as) <> ~(b :| bs) = a :| (as ++ b : bs)

This looks correct to me.

clyring commented 1 year ago

This has been bothering me for a long time. Thanks for making this proposal, @treeowl. I would suggest the following principle:

For every NonEmpty function that is different from a corresponding List function only in the presence of NonEmpty in its type, both the List and NonEmpty functions should have the same strictness properties.

Mostly this is so I don't have to keep two different sets of subtly different strictness behaviors in my mind, but also it may eventually make sense to make the NonEmpty versions just casts-with-proof around the List versions.

In a quick scan, I disagree with your initial assessments of the following functions:

cons and (<|): Technically we don't have Data.List.cons, so my principle doesn't exactly apply. But x <| y = x :| toList y is the natural definition, and is (with the proposed change to toList) slightly stricter than its current definition, since it doesn't allocate a (:) cell until y is evaluated. The two selector thunks the current version tends to allocate when used can seriously hurt performance when recursively building a NonEmpty. (And thanks to worker/wrapper-for-CPR the toList is often free in this scenario.)
(<>): Similarly, I would expect (a :| as) <> bs = a :| (as ++ toList bs), with no immediate allocation of a (:) cell corresponding to the second argument.

nomeata commented 1 year ago

@treeowl, can you explain why NonEmpty' is more natural to you?

(It's not to me - I don't see why I should have to worry about whether there are more elements after the first just to get the first element. But then my personal natural model for a non-empty list type is a refinement type (or, in first approximation, an invariant-protecting newtype) of the list type.)

treeowl commented 1 year ago

@nomeata, I probably said that too strongly. One thing, for me, is that consing and unconsing seem like really basic operations for something I'm calling a list, and for NonEmpty those require shuffling elements through different positions, with both potential run-time costs and also the unpleasant question of how strict cons should be.

nomeata commented 1 year ago

Oh, absolutely! I share that sentiment, and was one reason I was briefly considering experimenting with a newtype based approach for NonEmpty (but abandoned it for deficienies of the module system of Haskell, which means I couldn't protect the invariant as much as I hoped for): https://gitlab.haskell.org/ghc/ghc/-/issues/22270

parsonsmatt commented 1 year ago

I think I second @nomeata 's request for clarification - there's a lot of "I want this" and "I propose this change" but not a lot of justification or explanation why.

Most of these changes seem pretty reasonable, but I'm also a bit hesitant to make breaking changes in the laziness/semantics of a datatype. Following in the pattern of containers and many other datatype-providing libraries, why not export these new definitions from a module Data.List.NonEmpty.Strict? The existing ones could be moved to Data.List.NonEmpty.Lazy, and the existing Data.List.NonEmpty would re-export the functions from Data.List.NonEmpty.Lazy.

This approach wouldn't break anyone's code, and would allow for folks to explicitly select lazy vs strict variants of functions. The two modules could have documentation suggesting when you would want one or the other.

In the event that the community decides that Data.List.NonEmpty.Strict is universally preferable, we could WARNING on Data.List.NonEmpty for a release, telling folks "This will change to exporting Data.List.NonEmpty.Strict instead, either switch to that now or switch to Data.List.NonEmpty.Lazy to preserve existing behavior." Then the next major release actually does the switch and removes the WARNING.

uncons

Can you explain why NonEmpty a -> Either a (a, NonEmpty a) is more in line with your stated bias?

That feels pretty unnatural to me. You have to duplicate the code for handling the "known" case (definitely have at least one a), and you can't use it easily in a pattern match without writing multiple cases.

case uncons xs of
    Left a -> foo a
    Right (a, as) -> foo a <> foos as

case uncons xs of
    (a, mas) -> foo a <> foldMap foos mas

let (a, mas) = uncons xs

(a, mas) <- do ...; pure (uncons xs)

toList

Verifying my own understanding here:

λ> import Data.List.NonEmpty
λ> toList undefined
[*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  undefined, called at <interactive>:2:8 in interactive:Ghci2
λ> toList' (a :| as) = a : as
λ> toList' undefined
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  undefined, called at <interactive>:4:9 in interactive:Ghci3

The new variant will undefined when the toList call is evaluated, whereas the current version will undefined when the result of that toList call is evaluated. This appears to be in line with Set.toList.

lift

Oof, this one is gnarly - fromList is partial, and would really prefer to see Maybe in that result type.

map

Removing the irrefutable pattern would mean that map f undefined would be undefined, while currently map f undefined = undefined :| undefined. This seems pretty reasonable.

hasufell commented 1 year ago

@parsonsmatt I like the idea, but do I understand correctly that in order to maintain both variants, we kind of have to fix the type signature of Data.List.NonEmpty.Lazy.unzip, because Data.List.NonEmpty.Strict.unzip will not use Functor. Otherwise we end up with type divergence, which seems like a problem (no drop-in replacement possible in the worst case).

TeofilC commented 1 year ago

I think we should be really careful about introducing .Lazy/.Strict variants. Most of the time there aren't two clear options and it can make things very confusing and/or code us into a dead end.

For containers it makes a lot of sense. .Lazy is spine-strict and .Strict is additionally value strict. It makes sense here because the design space has already been narrowed down quite a lot by making the .Lazy variant already quite strict (on the other hand it's a bit confusing that the lazy one is already quite strict).

But in other cases it makes a lot less sense. I think Control.Monad.Trans.State.Strict is an example of this. It's stricter than the .Lazy variant but then there are stricter possibilities as well and it gives users of the library unearned confidence that they are avoiding space leaks.

In the case of Data.NonEmpty I can think of two possible variants off the top of my head that would make sense for the Data.NonEmpty.Strict module to be.

data NonEmptySpineStrict = EndESpineStrict a | ConsESpineStrict a !NonEmptySpineStrict, which is spine strict
data NonEmptyValueStrict= EndEValueStrict !a | ConsEValueStrict !a !NonEmptyValueStrict, which is value strict.

I could imagine that at some point in the future we might want to add something like the value strict variant to base and it would be a shame if the Lazy/Strict distinction was already taken.

Bodigrim commented 1 year ago

The basic problem is that instance Functor NonEmpty is different (lazier) than the instance one can obtain via DeriveFunctor. This is outright a bug in my books, because it breaks expectations which every other instance Functor in base adheres to.

Bodigrim commented 1 year ago

I would suggest the following principle:

For every NonEmpty function that is differs from a corresponding List function only in the presence of NonEmpty in its type, both the List and NonEmpty functions should have the same strictness properties.

Yes, nicely said. I fully agree with this principle stated. I followed it when designing https://github.com/Bodigrim/infinite-list#laziness.

Just to reiterate the problem. A function returning a record type with a single constructor can always return this very constructor before even looking at its arguments, not even weak head normal form. Notice the irrefutable pattern matching with ~:

data Pair a = Pair a a 

myFmap :: (a -> b) -> Pair a -> Pair b
myFmap f ~(Pair x y) = Pair (f x) (f y)

Under the hood this definition translates to

myFmap f p = Pair (let Pair x _ = p in f x) (let Pair _ y = p in f y)

On the first glance it might look like a good idea: there is nothing else other than Pair we can return, so let's be lazy to the core. The problem with such definition arises later, when you try to fight space leaks. If you seq the result of myFmap f x virtually nothing happens: no way to trigger evaluation of x at all, you just hold Pair with two thunks in it. For example,

> myFmap undefined undefined `seq` ()
()

That's not what we usually want, and that's why {-# LANGUAGE DeriveFoldable #-} does not generate irrefutable patterns in such cases. A normal derived instance would be

instance Functor Pair where
  fmap f (Pair x y) = Pair (f x) (f y)

and

> fmap undefined (undefined :: Pair ()) `seq` ()
*** Exception: Prelude.undefined

This principle holds for every data type in base, including normal lists, except... NonEmpty, which defines fmap manually with an irrefutable pattern:

instance Functor NonEmpty where
  fmap f ~(a :| as) = f a :| fmap f as

or equivalently

instance Functor NonEmpty where
  fmap f aas = (let a :| _ = aas in f a) :| (let _ :| as = aas in fmap f as)

There are multiple issues:

The behaviour of instance Functor NonEmpty semantically differs from the one which would be derived automatically.
It also behaves differently from instance Functor [].
It also does not follow intuition about Functor instances for other record types in base.
It is bad for space leaks, usual seq and combinators built atop it do nothing, one has to deepseq to force arguments to WHNF.
Lastly, it is bad for performance, because you pattern-match on everything twice.

Other functions in Data.List.NonEmpty are equally misbehaving, but we could have defined Data.List.NonEmpty.Strict with stricter versions. We cannot do this for instance Functor and I'd say that in practice it is the crux: if we do not want to fix it, it's better not to touch this at all and keep NonEmpty at least internally consistent.

parsonsmatt commented 1 year ago

That's convincing to me. +1

mixphix commented 1 year ago

This issue spurred me into making llun, which uses pattern synonyms to address @treeowl's point that Either a (a, NonEmpty a) is another useful representation. I haven't had a chance to figure out benchmarks, or document it, but most functions are implemented locally so it should be easy to compare with the current implementations.

Bodigrim commented 1 year ago

Dear CLC members. Could you please provide (non-binding) opinions on the proposal? Do you agree with the principle suggested in https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1324520956? Would you like to apply it to Data.List.NonEmpty?

@tomjaguarpaw @chshersh @angerman @hasufell @mixphix @parsonsmatt

tomjaguarpaw commented 1 year ago

The correspondence principle between [] and NonEmpty seems desirable to me.

hasufell commented 1 year ago

How do we assess impact? I also like the idea, but I'm not sure I see enough motivation if this can break code.

chshersh commented 1 year ago

I agree with the proposal and with further suggestions in https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1324520956.

My view is simple: if you label arguments with ~ explicitly, there should be a good reason to do so. Ideally, it should be documented in each case why arguments use irrefutable patterns. I don't see a good reason for NonEmpty, so let's remove it.

Side comment: In addition to cons x xs = x :| toList xs I also want cons' x (x :| xs) = x :| (x : xs) so cons' x1 $ cons' x2 $ cons' x3 $ singleton x4 would be equivalent to x1 :| [x2, x3, x4] but this could be done via a separate proposal.

How do we assess impact? I also like the idea, but I'm not sure I see enough motivation if this can break code.

I think, clc-stackage could also run tests in addition to the compilation. And I'm pretty sure that Stackage already runs non-disabled tests for the entire snapshot. It would be nice to use these capabilities for the impact assessment procedure but I'm not sure CLC has either the budget or capacity to implement this, so this should be taken to Haskell Foundation.

Bodigrim commented 1 year ago

I think, clc-stackage could also run tests in addition to the compilation.

Not in its current structure, unfortunately. Besides, running all tests of all packages will likely fail too often. Stackage curators maintain a list of test suites which should be excluded. Maybe one can run stackage-curator with all Stackage metadata but an updated GHC?.. I don't know about their infrastructure.

Bodigrim commented 1 year ago

@treeowl how would you like to proceed with this? There seems to be enough support of the idea, but a convincing impact assessment is likely to be very hard.

CC @juhp @DanBurton @cdornan @alexeyzab @mihaimaruseac as Stackage Curators (and sorry if I missed anyone else). Is there an easy way to build a Stackage snapshot and run all enabled test suites against a custom GHC? We would like to take the latest GHC 9.4 release, modify laziness of routines in Data.List.NonEmpty as described above and run tests to ensure that there is no breakage.

juhp commented 1 year ago

@Bodigrim it should be possible - we use a dedicated buildserver (thanks to @fpco) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark.

Also curator just uses stack underneath, so for the custom ghc, it should be possible to setup, but I am not sure if curator makes it harder.

I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise.

(This is my personal take, other curators can also chime in if they have something to add.)

cdornan commented 1 year ago

As Jens says, it will require resources -- a server that can build all the packages together.

Interestingly, David has a proposal to run the stackage setup with GHC-nightly -- if you can afford to take some time then that initiative could generate just what you need.

Chris

On 14 Apr 2023, at 02:47, Jens Petersen @.***> wrote:

@Bodigrim https://github.com/Bodigrim it should be possible - we use a dedicated buildserver (thanks to @fpco https://github.com/fpco) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark https://github.com/bergmark.

I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise.

(This is my personal take, other curators can also chime in if they have something to add.)

— Reply to this email directly, view it on GitHub https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1507813721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG7BSU7ULUMZORRFRSA5I3XBCUCFANCNFSM6AAAAAASGJ4UJI. You are receiving this because you were mentioned.

Bodigrim commented 1 year ago

Thanks @juhp and @cdornan. In such case I imagine this proposal awaits an enthusiatic volunteer to setup a clone of Stackage build server and run Stackage tests with a patched GHC. On constrast to our usual practices, just building clc-stackage with Cabal is not enough: there is no change in type signatures, the only change is runtime behaviour, so one has to run actual tests to provide a meaningful impact assessment.

Bodigrim commented 1 year ago

I'm afraid we are stuck here, unless there is an enthusiast to run tests for all Stackage packages. We might have a better luck with finding such individual, if there was an MR at hand. @treeowl could you possibly prepare one?

I strongly believe that this is an important issue, it would be a shame to drop it.

Bodigrim commented 1 year ago

@treeowl is there is no progress within two weeks, I'll close this as abandoned. We can return back anytime there are resources to execute.

Bodigrim commented 1 year ago

Closing as abandoned, feel free to reopen when there are resources to make some progress.

sergv commented 1 week ago

Can the proposal please be reopened?

I want to take over this proposal since the original author apparently let it go. I've prepared MR with the changes in base (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12824) and an impact assessment of building Stackage snapshot and running its tests.

haskell / core-libraries-committee