Open treeowl opened 1 year ago
This has been bothering me for a long time. Thanks for making this proposal, @treeowl. I would suggest the following principle:
NonEmpty
function that is different from a corresponding List
function only in the presence of NonEmpty
in its type, both the List
and NonEmpty
functions should have the same strictness properties.Mostly this is so I don't have to keep two different sets of subtly different strictness behaviors in my mind, but also it may eventually make sense to make the NonEmpty
versions just casts-with-proof around the List
versions.
In a quick scan, I disagree with your initial assessments of the following functions:
cons
and (<|)
: Technically we don't have Data.List.cons
, so my principle doesn't exactly apply. But x <| y = x :| toList y
is the natural definition, and is (with the proposed change to toList
) slightly stricter than its current definition, since it doesn't allocate a (:)
cell until y
is evaluated. The two selector thunks the current version tends to allocate when used can seriously hurt performance when recursively building a NonEmpty
. (And thanks to worker/wrapper-for-CPR the toList
is often free in this scenario.)(<>)
: Similarly, I would expect (a :| as) <> bs = a :| (as ++ toList bs)
, with no immediate allocation of a (:)
cell corresponding to the second argument.@treeowl, can you explain why NonEmpty'
is more natural to you?
(It's not to me - I don't see why I should have to worry about whether there are more elements after the first just to get the first element. But then my personal natural model for a non-empty list type is a refinement type (or, in first approximation, an invariant-protecting newtype) of the list type.)
@nomeata, I probably said that too strongly. One thing, for me, is that consing and unconsing seem like really basic operations for something I'm calling a list, and for NonEmpty
those require shuffling elements through different positions, with both potential run-time costs and also the unpleasant question of how strict cons
should be.
Oh, absolutely! I share that sentiment, and was one reason I was briefly considering experimenting with a newtype
based approach for NonEmpty (but abandoned it for deficienies of the module system of Haskell, which means I couldn't protect the invariant as much as I hoped for): https://gitlab.haskell.org/ghc/ghc/-/issues/22270
I think I second @nomeata 's request for clarification - there's a lot of "I want this" and "I propose this change" but not a lot of justification or explanation why.
Most of these changes seem pretty reasonable, but I'm also a bit hesitant to make breaking changes in the laziness/semantics of a datatype. Following in the pattern of containers
and many other datatype-providing libraries, why not export these new definitions from a module Data.List.NonEmpty.Strict
? The existing ones could be moved to Data.List.NonEmpty.Lazy
, and the existing Data.List.NonEmpty
would re-export the functions from Data.List.NonEmpty.Lazy
.
This approach wouldn't break anyone's code, and would allow for folks to explicitly select lazy vs strict variants of functions. The two modules could have documentation suggesting when you would want one or the other.
In the event that the community decides that Data.List.NonEmpty.Strict
is universally preferable, we could WARNING
on Data.List.NonEmpty
for a release, telling folks "This will change to exporting Data.List.NonEmpty.Strict
instead, either switch to that now or switch to Data.List.NonEmpty.Lazy
to preserve existing behavior." Then the next major release actually does the switch and removes the WARNING
.
uncons
Can you explain why NonEmpty a -> Either a (a, NonEmpty a)
is more in line with your stated bias?
That feels pretty unnatural to me. You have to duplicate the code for handling the "known" case (definitely have at least one a
), and you can't use it easily in a pattern match without writing multiple cases.
case uncons xs of
Left a -> foo a
Right (a, as) -> foo a <> foos as
case uncons xs of
(a, mas) -> foo a <> foldMap foos mas
let (a, mas) = uncons xs
(a, mas) <- do ...; pure (uncons xs)
toList
Verifying my own understanding here:
λ> import Data.List.NonEmpty
λ> toList undefined
[*** Exception: Prelude.undefined
CallStack (from HasCallStack):
undefined, called at <interactive>:2:8 in interactive:Ghci2
λ> toList' (a :| as) = a : as
λ> toList' undefined
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
undefined, called at <interactive>:4:9 in interactive:Ghci3
The new variant will undefined
when the toList
call is evaluated, whereas the current version will undefined
when the result of that toList
call is evaluated. This appears to be in line with Set.toList
.
lift
Oof, this one is gnarly - fromList
is partial, and would really prefer to see Maybe
in that result type.
map
Removing the irrefutable pattern would mean that map f undefined
would be undefined
, while currently map f undefined = undefined :| undefined
. This seems pretty reasonable.
@parsonsmatt I like the idea, but do I understand correctly that in order to maintain both variants, we kind of have to fix the type signature of Data.List.NonEmpty.Lazy.unzip
, because Data.List.NonEmpty.Strict.unzip
will not use Functor. Otherwise we end up with type divergence, which seems like a problem (no drop-in replacement possible in the worst case).
I think we should be really careful about introducing .Lazy
/.Strict
variants.
Most of the time there aren't two clear options and it can make things very confusing and/or code us into a dead end.
For containers
it makes a lot of sense. .Lazy
is spine-strict and .Strict
is additionally value strict. It makes sense here because the design space has already been narrowed down quite a lot by making the .Lazy
variant already quite strict (on the other hand it's a bit confusing that the lazy one is already quite strict).
But in other cases it makes a lot less sense. I think Control.Monad.Trans.State.Strict
is an example of this. It's stricter than the .Lazy
variant but then there are stricter possibilities as well and it gives users of the library unearned confidence that they are avoiding space leaks.
In the case of Data.NonEmpty
I can think of two possible variants off the top of my head that would make sense for the Data.NonEmpty.Strict
module to be.
data NonEmptySpineStrict = EndESpineStrict a | ConsESpineStrict a !NonEmptySpineStrict
, which is spine strictdata NonEmptyValueStrict= EndEValueStrict !a | ConsEValueStrict !a !NonEmptyValueStrict
, which is value strict.I could imagine that at some point in the future we might want to add something like the value strict variant to base
and it would be a shame if the Lazy/Strict distinction was already taken.
The basic problem is that instance Functor NonEmpty
is different (lazier) than the instance one can obtain via DeriveFunctor
. This is outright a bug in my books, because it breaks expectations which every other instance Functor
in base
adheres to.
I would suggest the following principle:
- For every
NonEmpty
function that is differs from a correspondingList
function only in the presence ofNonEmpty
in its type, both theList
andNonEmpty
functions should have the same strictness properties.
Yes, nicely said. I fully agree with this principle stated. I followed it when designing https://github.com/Bodigrim/infinite-list#laziness.
Just to reiterate the problem. A function returning a record type with a single constructor can always return this very constructor before even looking at its arguments, not even weak head normal form. Notice the irrefutable pattern matching with ~
:
data Pair a = Pair a a
myFmap :: (a -> b) -> Pair a -> Pair b
myFmap f ~(Pair x y) = Pair (f x) (f y)
Under the hood this definition translates to
myFmap f p = Pair (let Pair x _ = p in f x) (let Pair _ y = p in f y)
On the first glance it might look like a good idea: there is nothing else other than Pair
we can return, so let's be lazy to the core. The problem with such definition arises later, when you try to fight space leaks. If you seq
the result of myFmap f x
virtually nothing happens: no way to trigger evaluation of x
at all, you just hold Pair
with two thunks in it. For example,
> myFmap undefined undefined `seq` ()
()
That's not what we usually want, and that's why {-# LANGUAGE DeriveFoldable #-}
does not generate irrefutable patterns in such cases. A normal derived instance would be
instance Functor Pair where
fmap f (Pair x y) = Pair (f x) (f y)
and
> fmap undefined (undefined :: Pair ()) `seq` ()
*** Exception: Prelude.undefined
This principle holds for every data type in base
, including normal lists, except... NonEmpty
, which defines fmap
manually with an irrefutable pattern:
instance Functor NonEmpty where
fmap f ~(a :| as) = f a :| fmap f as
or equivalently
instance Functor NonEmpty where
fmap f aas = (let a :| _ = aas in f a) :| (let _ :| as = aas in fmap f as)
There are multiple issues:
instance Functor NonEmpty
semantically differs from the one which would be derived automatically.instance Functor []
.Functor
instances for other record types in base
.seq
and combinators built atop it do nothing, one has to deepseq
to force arguments to WHNF.Other functions in Data.List.NonEmpty
are equally misbehaving, but we could have defined Data.List.NonEmpty.Strict
with stricter versions. We cannot do this for instance Functor
and I'd say that in practice it is the crux: if we do not want to fix it, it's better not to touch this at all and keep NonEmpty
at least internally consistent.
That's convincing to me. +1
This issue spurred me into making llun, which uses pattern synonyms to address @treeowl's point that Either a (a, NonEmpty a)
is another useful representation. I haven't had a chance to figure out benchmarks, or document it, but most functions are implemented locally so it should be easy to compare with the current implementations.
Dear CLC members. Could you please provide (non-binding) opinions on the proposal? Do you agree with the principle suggested in https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1324520956? Would you like to apply it to Data.List.NonEmpty
?
@tomjaguarpaw @chshersh @angerman @hasufell @mixphix @parsonsmatt
The correspondence principle between []
and NonEmpty
seems desirable to me.
How do we assess impact? I also like the idea, but I'm not sure I see enough motivation if this can break code.
I agree with the proposal and with further suggestions in https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1324520956.
My view is simple: if you label arguments with ~
explicitly, there should be a good reason to do so. Ideally, it should be documented in each case why arguments use irrefutable patterns. I don't see a good reason for NonEmpty
, so let's remove it.
Side comment: In addition to cons x xs = x :| toList xs
I also want cons' x (x :| xs) = x :| (x : xs)
so cons' x1 $ cons' x2 $ cons' x3 $ singleton x4
would be equivalent to x1 :| [x2, x3, x4]
but this could be done via a separate proposal.
How do we assess impact? I also like the idea, but I'm not sure I see enough motivation if this can break code.
I think, clc-stackage
could also run tests in addition to the compilation. And I'm pretty sure that Stackage already runs non-disabled tests for the entire snapshot. It would be nice to use these capabilities for the impact assessment procedure but I'm not sure CLC has either the budget or capacity to implement this, so this should be taken to Haskell Foundation.
I think,
clc-stackage
could also run tests in addition to the compilation.
Not in its current structure, unfortunately. Besides, running all tests of all packages will likely fail too often. Stackage curators maintain a list of test suites which should be excluded. Maybe one can run stackage-curator
with all Stackage metadata but an updated GHC?.. I don't know about their infrastructure.
@treeowl how would you like to proceed with this? There seems to be enough support of the idea, but a convincing impact assessment is likely to be very hard.
CC @juhp @DanBurton @cdornan @alexeyzab @mihaimaruseac as Stackage Curators (and sorry if I missed anyone else). Is there an easy way to build a Stackage snapshot and run all enabled test suites against a custom GHC? We would like to take the latest GHC 9.4 release, modify laziness of routines in Data.List.NonEmpty
as described above and run tests to ensure that there is no breakage.
@Bodigrim it should be possible - we use a dedicated buildserver (thanks to @fpco) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark.
Also curator just uses stack underneath, so for the custom ghc, it should be possible to setup, but I am not sure if curator makes it harder.
I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise.
(This is my personal take, other curators can also chime in if they have something to add.)
As Jens says, it will require resources -- a server that can build all the packages together.
Interestingly, David has a proposal to run the stackage setup with GHC-nightly -- if you can afford to take some time then that initiative could generate just what you need.
Chris
On 14 Apr 2023, at 02:47, Jens Petersen @.***> wrote:
@Bodigrim https://github.com/Bodigrim it should be possible - we use a dedicated buildserver (thanks to @fpco https://github.com/fpco) to run curator with quite a bit of diskspace - but I am not sure how reproducible our build environment is, though most of it is in a container. Also cc @bergmark https://github.com/bergmark.
I don't think we can use the Stackage server to run your tests (we had a similar request a while back, which we couldn't fulfil), but we can try to help with questions/issues that arise.
(This is my personal take, other curators can also chime in if they have something to add.)
— Reply to this email directly, view it on GitHub https://github.com/haskell/core-libraries-committee/issues/107#issuecomment-1507813721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG7BSU7ULUMZORRFRSA5I3XBCUCFANCNFSM6AAAAAASGJ4UJI. You are receiving this because you were mentioned.
Thanks @juhp and @cdornan. In such case I imagine this proposal awaits an enthusiatic volunteer to setup a clone of Stackage build server and run Stackage tests with a patched GHC. On constrast to our usual practices, just building clc-stackage
with Cabal is not enough: there is no change in type signatures, the only change is runtime behaviour, so one has to run actual tests to provide a meaningful impact assessment.
I'm afraid we are stuck here, unless there is an enthusiast to run tests for all Stackage packages. We might have a better luck with finding such individual, if there was an MR at hand. @treeowl could you possibly prepare one?
I strongly believe that this is an important issue, it would be a shame to drop it.
@treeowl is there is no progress within two weeks, I'll close this as abandoned. We can return back anytime there are resources to execute.
Closing as abandoned, feel free to reopen when there are resources to make some progress.
Can the proposal please be reopened?
I want to take over this proposal since the original author apparently let it go. I've prepared MR with the changes in base (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12824) and an impact assessment of building Stackage snapshot and running its tests.
This is going to be ... hard. Some decisions, I think, will not be very controversial. Others will likely be quite controversial. Let's go through them one by one and try to figure things out. But first, I'd like to mention that along with the definition we have,
there's another, equally valid, expression of the concept of a nonempty list:
Each of these expressions is better at certain things and worse at certain things. Personally, I find the
NonEmpty'
expression more natural or fundamental, and therefore I will tend towards implementations that reflect the "natural" strictness we'd find there. However, I don't want to push for that where it feels unnatural.unfold
This function was deprecated ages and ages ago, and the time has long since come to delete it. There's no point discussing its strictness.
uncons
Currently,
I propose
What I actually want, based on my stated bias, is
Would that be a step too far? If so, would it be worth offering such a function by another name?
init
andlast
These have irrefutable patterns, but they're actually strict. Confusing, but just an implementation issue we don't need to discuss.
<|
andcons
This is defined
I believe this is the correct amount of laziness and we should leave it as is.
toList
Currently,
I propose
lift
Currently,
I don't have much intuition about how this function should behave. If we change
toList
, then it will automatically get stricter when applied toNonEmpty
s; is that okay?map
Currently,
I propose to remove the irrefutable pattern.
inits
,inits1
,tails
, andtails1
I have yet to form any opinion on these. The change in
toList
behavior affects them too.insert
Currently,
I think the proposed change in
toList
behavior is fine for this. It might make a difference for some sort of degenerateOrd
instance, but I don't imagine we'll get any complaints.scanl
,scanr
,scanl1
,scanr1
Currently, these are lazy; I propose to make them strict.
intersperse
Currently,
I'd definitely remove the irrefutable pattern. My bias would suggest forcing
bs
as well, but I doubt that's what people will actually want.reverse
Currently,
reverse
is actually strict, if I read it right. I'm fine with leaving it that way.take
Currently,
The proposed change to
toList
would make this strict, which I think would be better.drop
Currently,
which is lazy when
n <= 0
and strict otherwise. The proposed change totoList
would make it unconditionally strict, which I think would be better.splitAt
Currently,
This is odd the same way
drop
is. The proposed change totoList
will make it strict.takeWhile
,dropWhile
There's a pattern here; I think these are better with stricter
toList
.zip
,zipWith
Currently,
I would remove the irrefutable patterns.
unzip
Currently,
I propose changing this to
Again, my bias would suggest forcing
abs
, but I don't think that's what people will want.transpose
No clear opinion.
append
See
<>
below.appendList
This is strict, and I think should remain so.
prependList
This is strict, and I think should remain so.
Foldable
instanceWe have
I propose to remove all the irrefutable patterns.
Functor
instanceCurrently,
I propose to remove the irrefutable patterns.
Traversable
instanceCurrently,
I propose to remove the irrefutable pattern.
Semigroup
instanceWe currently have
This looks correct to me.