Closed JoshMeredith closed 1 year ago
What do your microbenchmarks do? Do you have timing data for the GHC application, or just allocation numbers for a profiled build?
I would intuitively expect a strict break
or span
to be better than a lazy one under two circumstances:
It would be good to find examples of both of the above in the wild, and to provide evidence that a stricter break
or span
actually realizes performance gains.
What do your microbenchmarks do? I'm just using a simple list of
Int
s ([0..size]
) in these tests, with the following predicates:predicateHalf :: Int -> Int -> Bool predicateHalf n x = x == (n `div` 2)
predicateZero :: Int -> Int -> Bool predicateZero n x = x == 0
predicateFull :: Int -> Int -> Bool predicateFull n x = x == n
> Do you have timing data for the GHC application, or just allocation numbers for a profiled build?
Unfortunately I've had trouble isolating useful numbers of a proper run of GHC with/without the changes - so the intention here is to try and use allocations as a proxy.
> The predicate is cheap
Interestingly, the time improvements seem to hold if I slow down the predicate
predicateSlow :: Int -> Int -> Bool
predicateSlow n x = x elem
[0..n] && x == (n div
2)
benchmarking 100000/slow/break time 457.2 ms (NaN s .. 497.9 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 440.0 ms (434.1 ms .. 449.1 ms) std dev 8.797 ms (1.718 ms .. 11.48 ms) variance introduced by outliers: 19% (moderately inflated)
benchmarking 100000/slow/break' time 444.9 ms (422.1 ms .. 457.7 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 431.0 ms (418.8 ms .. 437.6 ms) std dev 11.64 ms (2.141 ms .. 15.36 ms) variance introduced by outliers: 19% (moderately inflated)
Of course here it looks like the difference could be noise, but smaller lists reveal that it isn't:
benchmarking 10000/slow/break time 5.438 ms (5.238 ms .. 5.738 ms) 0.983 R² (0.963 R² .. 0.998 R²) mean 5.424 ms (5.312 ms .. 5.695 ms) std dev 520.1 μs (177.9 μs .. 969.6 μs) variance introduced by outliers: 58% (severely inflated)
benchmarking 10000/slow/break' time 4.626 ms (4.539 ms .. 4.709 ms) 0.997 R² (0.996 R² .. 0.999 R²) mean 4.698 ms (4.644 ms .. 4.874 ms) std dev 261.5 μs (107.4 μs .. 549.8 μs) variance introduced by outliers: 32% (moderately inflated)
benchmarking 1000/slow/break time 133.2 μs (132.3 μs .. 134.4 μs) 0.999 R² (0.999 R² .. 1.000 R²) mean 133.6 μs (132.8 μs .. 134.3 μs) std dev 2.487 μs (1.959 μs .. 3.534 μs) variance introduced by outliers: 13% (moderately inflated)
benchmarking 1000/slow/break' time 87.01 μs (85.76 μs .. 88.41 μs) 0.998 R² (0.997 R² .. 0.999 R²) mean 86.12 μs (85.36 μs .. 87.38 μs) std dev 3.319 μs (2.349 μs .. 5.725 μs) variance introduced by outliers: 39% (moderately inflated)
Please give the code to the actual benchmarks. You're not giving enough info about what you're actually measuring for other people to be able to evaluate the significance of that.
Here's the Criterion code:
main :: IO ()
main = defaultMain
[ bgroup (show sz)
[ bgroup pr
[ bench fn $
nf (f $ p sz) [0..sz] | (fn, f) <- fns] | (pr, p) <- predicates ] | sz <- listSizes]
predicateHalf :: Int -> Int -> Bool
predicateHalf n x = x == (n `div` 2)
predicateZero :: Int -> Int -> Bool
predicateZero n x = x == 0
predicateFull :: Int -> Int -> Bool
predicateFull n x = x == n
predicateSlow :: Int -> Int -> Bool
predicateSlow n x = x `elem` [0..n] && x == (n `div` 2)
listSizes :: [Int]
listSizes = [100,1000,10000,100000,1000000]
fns :: [(String, (Int -> Bool) -> [Int] -> ([Int], [Int]))]
fns =
[ ("break", break_)
, ("break'", break')
]
predicates :: [(String, Int -> Int -> Bool)]
predicates =
[ ("half", predicateHalf)
, ("zero", predicateZero)
, ("full", predicateFull)
, ("slow", predicateSlow)
]
break_ _ xs@[] = (xs, xs)
break_ p xs@(x:xs')
| p x = ([], xs)
| otherwise = let (ys, zs) = break_ p xs' in (x:ys, zs)
break' _ xs@[] = (xs, xs)
break' p xxs@(x:xs)
| p x = ([], xxs)
| otherwise = let !(ys, zs) = break' p xs in (x:ys, zs)
@JoshMeredith thanks for the benchmark.
My hunch is that lazy span/break are "faster" when you then lazily process a prefix (potentially a part of it) and then maybe not look at the suffix at all. Otherwise it feels that strict version is never worse.
E.g. If I change in the benchmark above to
main :: IO ()
main = defaultMain
[ bgroup (show sz)
[ bgroup pr
[ env (return [0..sz]) $ \input -> bench fn $
nf (post . f (p sz)) input | (fn, f) <- fns] | (pr, p) <- predicates ] | sz <- listSizes]
where post
post (xs, ys) = listToMaybe ys
then break'
is never slower
If I just look at a small prefix of xs
like
post (xs, ys) = take 20 xs
then break
is faster. (I should use takeWhile
to begin with in this case).
And if I look at both heads:
post (xs, ys) = (listToMaybe xs, listToMaybe ys)
then break'
is again never slower.
So it seems that lazy span
/ break
are only useful when you look partially at the prefix and may not look at the suffix (this case includes an infinite length input). But if you know that you will always look at the suffix, then strict versions are a win.
EDIT: lazy variant is also useful when you lazily consume prefix (discarding the elements as you go), and only then look at the suffix. i.e. you don't need to first find the break point. E.g. something like
go input = do
let (xs, ys) = break p input
forM_ xs $ \x -> ....
go ys
I guess.
Am I missing something?
Core of the workers: for interested
-- lazy
-- prefix is produced as we go.
Main.$wbreak_
= \ (@t_s2Fo) (ds_s2Fp :: t_s2Fo -> Bool) (xs_s2Fq :: [t_s2Fo]) ->
case xs_s2Fq of wild_X1E {
[] ->
(# ghc-prim:GHC.Types.[] @t_s2Fo, ghc-prim:GHC.Types.[] @t_s2Fo #);
: x_a1et xs'_a1eu ->
case ds_s2Fp x_a1et of {
False ->
let {
ds1_s2Cz [Dmd=LP(ML,ML)] :: ([t_s2Fo], [t_s2Fo])
[LclId]
ds1_s2Cz
= case Main.$wbreak_ @t_s2Fo ds_s2Fp xs'_a1eu of
{ (# ww_s2H3, ww1_s2H4 #) ->
(ww_s2H3, ww1_s2H4)
} } in
(# ghc-prim:GHC.Types.:
@t_s2Fo
x_a1et
(case ds1_s2Cz of { (ys_a1qW, zs_a1qX) -> ys_a1qW }),
case ds1_s2Cz of { (ys_a1qW, zs_a1qX) -> zs_a1qX } #);
True -> (# ghc-prim:GHC.Types.[] @t_s2Fo, wild_X1E #)
}
}
-- strict
-- here you'll see a first character of prefix only after the break character is found.
Main.$wbreak'
= \ (@t_s2Fv) (ds_s2Fw :: t_s2Fv -> Bool) (xs_s2Fx :: [t_s2Fv]) ->
case xs_s2Fx of wild_X1E {
[] ->
(# ghc-prim:GHC.Types.[] @t_s2Fv, ghc-prim:GHC.Types.[] @t_s2Fv #);
: x_a1eA xs1_a1eB ->
case ds_s2Fw x_a1eA of {
False ->
case Main.$wbreak' @t_s2Fv ds_s2Fw xs1_a1eB of
{ (# ww_s2H6, ww1_s2H7 #) ->
(# ghc-prim:GHC.Types.: @t_s2Fv x_a1eA ww_s2H6, ww1_s2H7 #)
};
True -> (# ghc-prim:GHC.Types.[] @t_s2Fv, wild_X1E #)
}
}
It would be good to find examples of both of the above in the wild, and to provide evidence that a stricter break or span actually realizes performance gains.
I personally don't need to see a lot of investigation on this.
It's not a breaking change and there surely are cases where strictness can be beneficial.
So I find this a no-brainer to add.
We might discuss whether the strict versions should live in their own module, but I guess that ship has sailed.
@phadej My guess is that the lazy version will be faster if most or all of the prefix is consumed before the suffix is inspected, especially if the prefix is long. Of course, in the "all" case it would really be better to "split" the list into something like Free ((,) a) [a]
.
So I find this a no-brainer to add.
Quite contrary, I'd say: adding these function variants to a prominent place like Data.List
incurs the cost of having to make a choice for every user reaching for span
/break
. Also, a teacher explaining these functions will have to explain why there are variants.
I find that a relevant regression in usability, and needs to be justified by common, realistic use cases with significant performance benefits.
(Imagine we didn't have foldl
yet, only foldl'
, how much nicer Haskell would be. If some would argue for adding lazy foldl
we would also ask for good evidence why it's worth the complications.)
I find that a relevant regression in usability, and needs to be justified by common, realistic use cases with significant performance benefits.
I don't think it has to be significant. Anecdotal evidence and microbenchmarks have been presented and the proposed implementation is in no way as preposterous as foldl
. I see reason that it's going to be useful and don't require the author to go through stackage to prove the point.
But if others want more evidence, sure.
Anecdotal evidence and microbenchmarks have been presented
TBH, I still have no idea when to pick span'
and when span
. With foldl'
and foldl
I understand the difference, with span
and span'
I don't. (E.g. I don't see how lazy span
can introduce a space leak/usage peak, in a way foldl
could).
So I agree with @nomeata.
If span'
and break'
functions are added I'd expect very good explanation of the difference in the docs of how to pick between variants.
Personally I'm against increasing the API surface area of base unless it comes with very clear criteria for deciding when to use the new versions in preference to the old versions. I think the proposed new functionality here is a prime candidate for being added to a separate, new package, tried for a few years, and then integrated into base only if it proves sufficiently popular.
I think the proposed new functionality here is a prime candidate for being added to a separate, new package, tried for a few years, and then integrated into base only if it proves sufficiently popular.
I think an existing package will do: https://hackage.haskell.org/package/extra-1.7.12/docs/Data-List-Extra.html
How do you want to check for popularity? Stackage?
It also runs the chicken-and-egg problem similar to https://github.com/haskell/core-libraries-committee/issues/78 ...if it isn't in base, people may simply not be aware of the option, unless they already know about it.
base
being a collection of the "best primitives" seems a bit strict to me, though?
It also runs the chicken-and-egg problem similar to https://github.com/haskell/core-libraries-committee/issues/78 ...if it isn't in base, people may simply not be aware of the option, unless they already know about it.
It's a fair point. There is an unresolved ambiguity about whether base
should lead or lag common practice. Essentially I think the problem comes down to there being no shared understanding of what the meaning and purpose of base
really is.
In current practice, base
cannot lead common practice development. Consider discussion on changing signature of Data.List.NonEmpty.unzip
- quite small and isolated change is made extremely difficult.
It's impossible to get anything right first time, but changes (which may affect existing code, i.e. not additions) in base
are extremely laborious to push through.
whether base
should lead or lag common practice is directly related to how much we would like (and CLC would allow us) to change already existing things in base
, possibly affecting existing code out there.
whether base should lead or lag common practice is directly related to how much we would like (and CLC would allow us) to change already existing things in base,
That's an interesting view, but I don't share it.
I have very little appetite for changing/breaking things. But I don't ever see span'
or break'
being subject to e.g. a different type signature. Nor do I see a lot of space for iterating on the implementation itself.
I may agree with you on cases where the types are not set in stone or where there's a lot of design space in the implementation. I don't think either is the case here.
@hasufell it's on you (the CLC) to come up with clear criteria
Should dup :: a -> (a, a)
be added. Or uncurry' f (x, y) = f x y
(without lazy pattern match) to base
. What about ordNub
(really should be in base
, shouldn't it). or
pureIf :: (Alternative m) => Bool -> a -> m a
pureIf b a = if b then pure a else empty
(though someone might call that guarded
).
These are examples of small combinators which I needed in last two years. I don't know why they are not in base
. Would it be easy to add them to base
?
@phadej It seems more practical to use the ordNub
, ordNubOn
, etc., in containers
, which use appropriate data structures rather than lists.
@hasufell it's on you (the CLC) to come up with clear criteria
That's a fair proposal that would allow potential base contributors to better assess whether to invest time in certain ideas.
However, even if we manage to agree on a baseline (which is non-binding anyway), the next CLC might simply revert it (and they should be able to).
I think the proposed new functionality here is a prime candidate for being added to a separate, new package,
Keep in mind that part of this proposal is to enable using this functionality within GHC itself, so we do need it to be in a boot library.
The alternative is to silo it away in some GHC.Utils.
module, since GHC.List
is also a part of base
. In my opinion, this is considerably worse than including it in base
for consistency.
The alternative is to silo it away in some GHC.Utils. module, since GHC.List is also a part of base. In my opinion, this is considerably worse than including it in base for consistency.
Another alternative is to include it in a GHC-specific boot package (perhaps a new one) and to expose it from base if and when it proves itself.
TBH, I still have no idea when to pick
span'
and whenspan
. Withfoldl'
andfoldl
I understand the difference, withspan
andspan'
I don't. (E.g. I don't see how lazyspan
can introduce a space leak/usage peak, in a wayfoldl
could).
The closest analogy is not foldl
vs foldl'
, but catamorphisms foldr
vs foldr'
. One can implement span
via foldr
:
span p = foldr go ([], [])
where
go x (pref, suff)
| p x = (x : pref, suff)
| otherwise = ([], x : pref ++ suff)
and span'
is the same but with foldr'
.
A more efficient span
, equivalent to the version in base
, is a paramorphism, not a catamorphism:
para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = go
where
go :: [a] -> b
go [] = z
go (x : xs) = f x xs (go xs)
span :: (a -> Bool) -> [a] -> ([a], [a])
span p = para (\x xs -> if p x then first (x :) else const ([], x : xs)) ([], [])
And span'
would be the same but with para'
, which forces go xs
before calling f x xs (go xs)
.
When would we prefer para'
over para
? Pretty much every time when f x xs (go xs)
is likely to force its the last argument. In our implementation of span
the last argument is forced only when p x
. Thus if p x
is likely to succeed it's better to use strict span'
than span
. And vice versa for break
: if p x
in likely to succeed use lazy break
, otherwise break'
.
@JoshMeredith
Keep in mind that part of this proposal is to enable using this functionality within GHC itself, so we do need it to be in a boot library.
@tomjaguarpaw
Another alternative is to include it in a GHC-specific boot package (perhaps a new one) and to expose it from base if and when it proves itself.
ghc
library has plenty of utils already in it. See https://hackage.haskell.org/package/ghc-9.4.4/docs/GHC-Utils-Misc.html
ghc library has plenty of utils already in it. See https://hackage.haskell.org/package/ghc-9.4.4/docs/GHC-Utils-Misc.html
Yes, I think that's an unfortunate historical accident. It leads to base
doing an awkward double (or even triple) duty. I'm broadly sympathetic to @Ericson2314's views on the matter (although there are a lot of devilish details to be worked out).
My understanding is: this functionality is needed for work on and in ghc. And the question is merely whether or not those functions seem useful enough to expose them through base
, or just not expose them and implement them somewhere private to GHC. While I tend to value consistency and predictability (when there is a lazy f
, I often find a strict f'
), the discussion that happened here leads me to believe the latter (adding it private to GHC only) might be the better choice.
I will however excuse myself from voting on this, as Josh is indirectly reporting.
This can be made more strict by introducing a bang pattern on the let-binding:
Huh, that's surprising - I was under the impression that a !
forced it's argument to WHNF, but a constructor match like let (a, b) = ...
is already in WHNF, and should not increase strictness 🤔 Or am I confusing case
and let
here? ie is the proposed equivalent to
break _ xs@[] = (xs, xs)
break p xs@(x:xs')
| p x = ([],xs)
| otherwise = let (ys,zs) = break p xs' in (x:ys,zs)
break' _ xs@[] = (xs, xs)
break' p xs@(x:xs')
| p x = ([],xs)
| otherwise = case break' p xs' of
(ys,zs) -> (x:ys,zs)
I'm curious to see how/if these benchmarks change if the given functions aren't defined in the same module. IIRC, the effects systems benchmarks all suffered from this - GHC's inliner was able to optimize same-module code super well, and cross-module code suffered.
I would definitely want to see clear examples on when you'd want the strict versions vs the lazy ones. Especially if there's some case study - maybe some GHC code that uses break'
heavily can have a substatial speedup. If this is for use in GHC, then it must be for improving performance, and there likewise must be some upstream justification beyond a microbenchmark.
@parsonsmatt A strict let binding is not a bang pattern (see https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/strict.html#strict-bindings). In the case of let !a in b
, a
is forced unconditionally, while a bang pattern (let (!a, b) in c
) only forces a
when b
is evaluated in c
.
Huh, that's surprising - I was under the impression that a
!
forced it's argument to WHNF
Sylvain's comment in the GHC MR explains this well (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9868#note_478265).
I'm curious to see how/if these benchmarks change if the given functions aren't defined in the same module
When universally changing the break implementation to the strict version, and rebuilding the compiler/boot libraries, we still see a significant improvement on allocations in the ticky profile of building the startup code program with ghc (i.e. echo "main = return ()" >> hi.hs; ghc hi.hs +RTS -rprofile.ticky
). We see from the STG of the benchmark program that neither version is inlined, despite being in the same module.
Quite contrary, I'd say: adding these function variants to a prominent place like Data.List incurs the cost of having to make a choice for every user reaching for span/break.
In general I think this is a good point, but I'd like to point out that this is already an informal standard in basically all of containers
and most container-like packages, i.e., offering a lazy and a strict api version where the strict is appended with a '
for example, iterate
vs iterate'
. I would argue that this notational convention is as common as the appended _
to mean to throw away the result of a computation. So to tie this back, adding these functions are in line with the informal api standard and most users are already forced to make these kinds of decisions for prominent packages in the Haskell ecosystem. So in my view yes this point is correct: "adding these function variants...incurs the cost of having to make a choice", but that choice is already foisted onto users in core libraries and that choice should be very very well known to users since it is a very common choice in the haskell (or hackage?) ecosystem. Heck the convention of a '
to mean more strict version
is even in the Learn You a Haskell chapter on Data.List.
My hunch is that lazy span/break are "faster" when you then lazily process a prefix (potentially a part of it) and then maybe not look at the suffix at all. Otherwise it feels that strict version is never worse.
That's my hunch as well, and indeed in such case one should use takeWhile
instead of break
.
Actually, rather than introducing new functions, I would rather amend existing implementations of break
/ span
on the ground that break'
/ span'
are better defaults under normal circumstances and there is no semantical difference between them. I'm interested to hear from other CLC members.
that choice should be very very well known to users since it is a very common choice in the haskell (or hackage?) ecosystem.
But it's a new choice for every new function. Just because someone has learned about the convention and maybe how to choose between foldl and foldl' doesn't mean they will have an easy choosing between break and break'. In fact, it seems that not even this crowd here has an easy time identifying rules when exactly one should use one or the other. So it seems to be a hard choice, and therefore one that shouldn't be put on users easily.
@Bodigrim, I very strongly oppose changing the semantics of these very old functions.
foldl @[]
, they are not footguns. Rather, they're exactly what one would expect in a lazy language like Haskell.I'm not sure about (2). It seems to me that break'
has exactly the same degree of observable laziness as break
.
You're right that we should think hard before changing the semantics of functions defined in the Haskell Report, @treeowl. It's about the only thing that's sacred. These functions (if they are worth adding), ought to have dedicated names.
But come on. It's not as if the entire ecosystem is getting hot-swapped out from underneath the software running on some server. And what server would be running so many span
s on lists that are not worth fully traversing?
@Bodigrim, the semantics are absolutely different.
break (>2) [1, 2, undefined] = (1 : 2 : _|_, _|_)
break' (>2) [1, 2, undefined] = _|_
@mixphix, while it's no longer as popular as it once was, there are still loads of Haskell programs using lazy I/O. It's perfectly legitimate to apply span
to the entire input stream and expect output to start happening in a streaming fashion before the predicate pops False
.
This is a tough one to make a decision 😮💨
As a CLC member, I would have a much easier time voting on proposals if we had:
base
goals and its mission statement (e.g. whether it should lag behind or pioneer)base
idioms and best practices (does '
mean strict? is GHC.*
namespace non-public? etc.)Until we have this, every single decision will be exhausting and time-consuming as different people have different views on base
on how it should move, and, most importantly, people's views change with time.
I would like to summarise, why making a decision on this proposal is difficult (for me personally):
break'
is different. So changing the definition of break
and span
is a breaking change if we want to go that route. If anyone in the world relies on this undocumented behaviour, I could imagine it would be quite difficult for them to investigate the problem.
λ: break (> 2) [1, 2, undefined]
([1,2*** Exception: Prelude.undefined
λ: break' (> 2) [1, 2, undefined]
*** Exception: Prelude.undefined
break
and break'
is subtle, so making a decision between the two functions as a Haskell developer is challenging. If we go the route of introducing new functions, I would suggest making lucid documentation for all four functions a hard requirement for acceptance (I can review the documentation changes and provide my advice on them using my experience in education).break
. The new definition of break'
is strictly better (pun intended). I would love base
to be the place that encourages best practices and provides sane defaults out of the box. But base
accumulated 30-year-old baggage and it's not going away.The existing definition of break
is already not streaming and lazy enough. If anyone ever wanted streaming and maximally lazy implementation of break
, they should write something like this:
data BreakList a = Cons a (BreakList a) | Finish [a]
deriving (Show)
breakList :: (a -> Bool) -> [a] -> BreakList a
breakList _ [] = Finish []
breakList p xs@(x:xs')
| p x = Finish xs
| otherwise = Cons x (breakList p xs')
Inspired by
break
fromstreaming
In the context of all the above, I'm +1 on adding the new break' / span'
functions with the hard requirement that documentation for all four functions (break
, span
, span'
, break'
) will be expanded with:
break / break'
and span / span'
)P.S. Improving the documentation of existing functions break
and span
would be a great idea regardless, but, oh, well, someone needs to write it in the end 😞
As a CLC member, I would have a much easier time voting on proposals if we had:
- A strict definition of
base
goals and its mission statement (e.g. whether it should lag behind or pioneer)- A set of rules telling whether a breaking change is acceptable or not
- A documentation of
base
idioms and best practices (does'
mean strict? isGHC.*
namespace non-public? etc.)Until we have this, every single decision will be exhausting and time-consuming as different people have different views on
base
on how it should move, and, most importantly, people's views change with time.
I somewhat agree, but as you know, my recent attempt at finding some very general alignment by simply understanding the opinion of all CLC individuals in public (through manifestos) didn't go very far.
However, CLC members are free to organize themselves and discuss proposals privately in any fashion they please, so my suggestion is that people should just ping each other to get a better understanding of those topics.
@chshersh I think something has gone very wrong if not one but two members of the CLC think the laziness of span
and break
is subtle, unexpected, or obscure. Additionally, you claim that stricter semantics are strictly better here, without justifying that claim.
@treeowl your last comment is not constructive. We are here to nurture a common understanding, not for dismissive remarks. You are very welcome to provide more examples when this difference in laziness matters to support your point of view.
@Bodigrim I apologize for my harsh tone. But I think it's important that the committee recognize that Haskell was originally intended specifically as a lazy functional language, and that functions like break
and span
were designed with lazy intent. Are they the best way to accomplish this in practice? Probably not—a proper streaming approach is less error prone and likely more efficient. But I wouldn't want to bet on there not being a lot of code actually relying on the laziness.
-1
In my opinion this has not been strongly enough motivated for inclusion in base
. If this is needed for GHC then it should be included in a GHC-specific boot, not-intentionally-user-facing package (perhaps a new one). We can consider exposing it from base if and when it proves itself.
@chshersh I think something has gone very wrong if not one but two members of the CLC think the laziness of span and break is subtle, unexpected, or obscure
@treeowl I didn't say that the laziness of span
and break
is subtle, I said that the difference between break
and break'
w.r.t. to laziness semantics is subtle. If it's obvious to you then good for you. But I would appreciate it if you could address something I said and not something I haven't said. This would make the dialogue more efficient 😌
Although the behaviour of break
makes sense to me, similarly to @parsonsmatt, the behaviour of the let (x, y) =
binding wasn't obvious at all. Moreover, the result of break
around edge cases is not documented currently in any way. Without further guarantees on the established API, I can easily imagine GHC changing its optimizer, and suddenly break
starts behaving differently due to unexpected light of events.
Additionally, you claim that stricter semantics are strictly better here, without justifying that claim.
Nobody in this thread provided a real-world example where people actually want the existing behaviour of break
and may even prefer it to break'
. It may be faster when you need to traverse only the first half, but at this point, it's better to use takeWhile
. And since break'
provides fewer allocations (and, thus, faster because of that), I see break'
as a strict improvement on top of break
.
I hope I was able to clarify my words, so feel free to ask any follow-up questions if you think I haven't explained my understanding in detail 😌
Nobody in this thread provided a real-world example where people actually want the existing behaviour of
break
and may even prefer it tobreak'
If you want to traverse both the first half and the second half in a streaming fashion then you will want break
. break'
consumes stack proportional to the length of the matching prefix before yielding any result. In particular fst (break' (< 0) [1..]) == _|_
but fst (break (< 0) [1..]) == [1..]
.
Nobody in this thread provided a real-world example where people actually want the existing behaviour of
break
and may even prefer it tobreak'
If you want to traverse both the first half and the second half in a streaming fashion then you will want
break
.break'
consumes stack proportional to the length of the matching prefix before yielding any result. In particularfst (break' (< 0) [1..]) == _|_
butfst (break (< 0) [1..]) == [1..]
.
Can anyone provide benchmarks for cases that rely on list fusion?
@hasufell What does list fusion have to do with this?
@hasufell What does list fusion have to do with this?
I don't know, that's why I'm asking.
@hasufell, list fusion has nothing to do with it. Neither implementation is a "good consumer" and neither is a "good producer".
P.S. Improving the documentation of existing functions
break
andspan
would be a great idea regardless, but, oh, well, someone needs to write it in the end 😞
Inspired by this, I took a stab at break
/ span
in the course of https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10081/diffs. Everyone is welcome to review.
There are surprisingly many libraries definining things named break'
/ span'
, including, for instance, text
: https://hackage-search.serokell.io/?q=%5B%5E%27%60.%5Cw%5D%28break%7Cspan%29%27. If the proposal is accepted, they are likely to suffer from name clashes. I would like to see a full-scale impact assessment running clc-stackage
to determine exact impact.
Technically this is non-breaking and PVP recommends everyone to use qualified imports or explicit import lists only, but as everyone knows the practice is quite different. We really need -Wcompat-unqualified-imports
in -Wall
to alleviate this pain, see !21791.
I'm generally sympathetic to the proposal, because in many cases those tuple thunks make no sense, but kinda on the fence because of social aspects.
In fact, it seems that not even this crowd here has an easy time identifying rules when exactly one should use one or the other. So it seems to be a hard choice, and therefore one that shouldn't be put on users easily.
Hmm, I agree with this point. You've flipped me @nomeata :)
Then the thing to add is guidance on the appropriate situation to use each function. Something like the datasheets for datasets effort in the AI literature. The basic idea is that the electronics industry provides a datasheet for each component that is produced. This datasheet describes the component and provides guidance on use. We can also do something like this, although not issue something as extensive as an entire datasheet. But even a few sentences would be a step in the right direction.
Many of the functions provided in
Data.List
(defined inGHC.List
) also provide equivalent, stricter versions. Of those functions that currently don't provide these versions,break
andspan
could benefit from doing so - for consistency, as well as because these stricter versions can be shown to be more performant in certain situations.Implementation
If we consider
break
's implementation:This can be made more strict by introducing a bang pattern on the let-binding:
Measurements
Since this strictness allows GHC to keep the tuple unboxed, we expect to see fewer allocations per level of recursion.
STG
As a starting point for allocations, we can compare the STG of each version, noting that the majority of let-allocations are removed in the stricter version.
break'
:break
:Microbenchmarks
When comparing simple predicates (in these tests, just comparing the value of an
Int
element), microbenchmarks show that time performance can often be in favour of the stricter version. Since the microbenchmarks fully evaluate the output, these have the condition of only situations where the list will be fully consumed.My microbenchmarks test breaking at the start, middle, and end of a 100,000 element list of
Int
s, and all 3 cases appear in favour of the stricter version.Start:
Middle:
End:
Motivational Use in GHC
By running a ticky profile on GHC's startup code using a discussed example (https://gitlab.haskell.org/ghc/ghc/-/issues/16822), we can see that there are a significant number of allocations caused by
break
:Then, when comparing with a version where the one use of
break
inGHC.hs
(GHC's command-line argument filtering) is replaced bybreak'
, we can see that this stricter version can be selectively applied for allocation improvements:Conclusion
Providing stricter versions of
break
andspan
is consistent with the general design ofData.List
, and measurements of these stricter versions show that they are situationally more performant.