Data.List (and GHC.List) should provide stricter versions of `break` and `span`

JoshMeredith commented 1 year ago

Many of the functions provided in Data.List (defined in GHC.List) also provide equivalent, stricter versions. Of those functions that currently don't provide these versions, break and span could benefit from doing so - for consistency, as well as because these stricter versions can be shown to be more performant in certain situations.

Implementation

If we consider break's implementation:

break _ xs@[]           =  (xs, xs)
break p xs@(x:xs')
           | p x        =  ([],xs)
           | otherwise  =  let (ys,zs) = break p xs' in (x:ys,zs)

This can be made more strict by introducing a bang pattern on the let-binding:

break' _ xs@[]           =  (xs, xs)
break' p xs@(x:xs')
            | p x        =  ([],xs)
            | otherwise  =  let !(ys,zs) = break' p xs' in (x:ys,zs)

Measurements

Since this strictness allows GHC to keep the tuple unboxed, we expect to see fewer allocations per level of recursion.

STG

As a starting point for allocations, we can compare the STG of each version, noting that the majority of let-allocations are removed in the stricter version.

break':

Rec {
Main.$wbreak' [InlPrag=[2], Occ=LoopBreaker]
  :: forall {t}. (t -> GHC.Types.Bool) -> [t] -> (# [t], [t] #)
[GblId[StrictWorker([~, !])],
 Arity=2,
 Str=<LC(S,L)><1L>,
 Unf=OtherCon []] =
    {} \r [ds_s4KF xs_s4KG]
        case xs_s4KG<TagProper> of wild_s4KH [Occ=Once1] {
          [] -> (#,#) [GHC.Types.[] GHC.Types.[]];
          : x_s4KI xs1_s4KJ [Occ=Once1] ->
              case ds_s4KF x_s4KI of {
                GHC.Types.False ->
                    case
                        case xs1_s4KJ of xs1_t4MU {
                        __DEFAULT -> Main.$wbreak' ds_s4KF xs1_t4MU;
                        }
                    of
                    {
                    (#,#) ww_s4KM [Occ=Once1] ww1_s4KN [Occ=Once1] ->
                    let {
                      sat_s4KO [Occ=Once1] :: [t_s4F6]
                      [LclId] =
                          :! [x_s4KI ww_s4KM];
                    } in  (#,#) [sat_s4KO ww1_s4KN];
                    };
                GHC.Types.True -> (#,#) [GHC.Types.[] wild_s4KH];
              };
        };
end Rec }

break:

Rec {
GHC.List.$wbreak [InlPrag=[2], Occ=LoopBreaker]
  :: forall {a}. (a -> GHC.Types.Bool) -> [a] -> (# [a], [a] #)
[GblId[StrictWorker([~, !])],
 Arity=2,
 Str=<LC(S,L)><1L>,
 Unf=OtherCon []] =
    {} \r [ds_s35A xs_s35B]
        case xs_s35B<TagProper> of wild_s35C [Occ=Once1] {
          [] -> (#,#) [GHC.Types.[] GHC.Types.[]];
          : x_s35D xs'_s35E [Occ=Once1] ->
              case ds_s35A x_s35D of {
                GHC.Types.False ->
                    let {
                      ds1_s35G [Dmd=LP(ML,ML)] :: ([a_s2Jt], [a_s2Jt])
                      [LclId] =
                          {ds_s35A, xs'_s35E} \u []
                              case
                                  case xs'_s35E of xs'_t3in [Occ=Once1] {
                                  __DEFAULT -> GHC.List.$wbreak ds_s35A xs'_t3in;
                                  }
                              of
                              {
                              (#,#) ww_s35I [Occ=Once1] ww1_s35J [Occ=Once1] ->
                              (,) [ww_s35I ww1_s35J];
                              }; } in
                    let {
                      sat_s35S [Occ=Once1] :: [a_s2Jt]
                      [LclId] =
                          {ds1_s35G} \u []
                              case ds1_s35G of {
                              (,) _ [Occ=Dead] zs_s35R [Occ=Once1] -> zs_s35R;
                              }; } in
                    let {
                      sat_s35N [Occ=Once1] :: [a_s2Jt]
                      [LclId] =
                          {ds1_s35G} \u []
                              case ds1_s35G of {
                              (,) ys_s35L [Occ=Once1] _ [Occ=Dead] -> ys_s35L;
                              }; } in
                    let {
                      sat_s35O [Occ=Once1] :: [a_s2Jt]
                      [LclId] =
                          :! [x_s35D sat_s35N];
                    } in  (#,#) [sat_s35O sat_s35S];
                GHC.Types.True -> (#,#) [GHC.Types.[] wild_s35C];
              };
        };
end Rec }

Microbenchmarks

When comparing simple predicates (in these tests, just comparing the value of an Int element), microbenchmarks show that time performance can often be in favour of the stricter version. Since the microbenchmarks fully evaluate the output, these have the condition of only situations where the list will be fully consumed.

My microbenchmarks test breaking at the start, middle, and end of a 100,000 element list of Ints, and all 3 cases appear in favour of the stricter version.

Start:

benchmarking 100000/zero/break
time                 349.1 μs   (346.1 μs .. 352.6 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 349.1 μs   (347.0 μs .. 351.1 μs)
std dev              7.036 μs   (5.702 μs .. 8.583 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking 100000/zero/break'
time                 302.2 μs   (298.7 μs .. 305.3 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 298.3 μs   (296.1 μs .. 301.0 μs)
std dev              7.848 μs   (6.737 μs .. 9.845 μs)
variance introduced by outliers: 20% (moderately inflated)

Middle:

benchmarking 100000/half/break
time                 9.153 ms   (8.632 ms .. 9.503 ms)
                     0.988 R²   (0.964 R² .. 0.999 R²)
mean                 9.331 ms   (9.082 ms .. 9.453 ms)
std dev              466.1 μs   (235.1 μs .. 750.4 μs)
variance introduced by outliers: 24% (moderately inflated)

benchmarking 100000/half/break'
time                 4.496 ms   (4.457 ms .. 4.532 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 4.469 ms   (4.447 ms .. 4.491 ms)
std dev              69.81 μs   (53.07 μs .. 93.55 μs)

End:

benchmarking 100000/full/break
time                 19.32 ms   (18.49 ms .. 20.32 ms)
                     0.989 R²   (0.979 R² .. 0.996 R²)
mean                 16.99 ms   (15.56 ms .. 18.08 ms)
std dev              2.941 ms   (1.791 ms .. 3.642 ms)
variance introduced by outliers: 73% (severely inflated)

benchmarking 100000/full/break'
time                 9.800 ms   (9.611 ms .. 10.04 ms)
                     0.995 R²   (0.992 R² .. 0.998 R²)
mean                 8.634 ms   (8.159 ms .. 8.947 ms)
std dev              1.069 ms   (772.0 μs .. 1.440 ms)
variance introduced by outliers: 67% (severely inflated)

Motivational Use in GHC

By running a ticky profile on GHC's startup code using a discussed example (https://gitlab.haskell.org/ghc/ghc/-/issues/16822), we can see that there are a significant number of allocations caused by break:

Entries       Alloc     Alloc'd  Non-void Arguments   STG Name
--------------------------------------------------------------------------------
   4312      436072           0   2 >L                GHC.List.$wbreak{v r356} (fun)

Then, when comparing with a version where the one use of break in GHC.hs (GHC's command-line argument filtering) is replaced by break', we can see that this stricter version can be selectively applied for allocation improvements:

Entries       Alloc     Alloc'd  Non-void Arguments   STG Name
--------------------------------------------------------------------------------
   2433      246064           0   2 >L                GHC.List.$wbreak{v r16} (fun)

Conclusion

Providing stricter versions of break and span is consistent with the general design of Data.List, and measurements of these stricter versions show that they are situationally more performant.

treeowl commented 1 year ago

What do your microbenchmarks do? Do you have timing data for the GHC application, or just allocation numbers for a profiled build?

I would intuitively expect a strict break or span to be better than a lazy one under two circumstances:

The predicate is cheap, the first component of the output is short, and ... probably some more conditions.
The second component of the output is inspected before the first component.

It would be good to find examples of both of the above in the wild, and to provide evidence that a stricter break or span actually realizes performance gains.

JoshMeredith commented 1 year ago

What do your microbenchmarks do? I'm just using a simple list of Ints ([0..size]) in these tests, with the following predicates:
predicateHalf :: Int -> Int -> Bool
predicateHalf n x = x == (n `div` 2)

predicateZero :: Int -> Int -> Bool predicateZero n x = x == 0

predicateFull :: Int -> Int -> Bool predicateFull n x = x == n


> Do you have timing data for the GHC application, or just allocation numbers for a profiled build?
Unfortunately I've had trouble isolating useful numbers of a proper run of GHC with/without the changes - so the intention here is to try and use allocations as a proxy.

> The predicate is cheap
Interestingly, the time improvements seem to hold if I slow down the predicate

predicateSlow :: Int -> Int -> Bool predicateSlow n x = x elem [0..n] && x == (n div 2)

benchmarking 100000/slow/break time 457.2 ms (NaN s .. 497.9 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 440.0 ms (434.1 ms .. 449.1 ms) std dev 8.797 ms (1.718 ms .. 11.48 ms) variance introduced by outliers: 19% (moderately inflated)

benchmarking 100000/slow/break' time 444.9 ms (422.1 ms .. 457.7 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 431.0 ms (418.8 ms .. 437.6 ms) std dev 11.64 ms (2.141 ms .. 15.36 ms) variance introduced by outliers: 19% (moderately inflated)

Of course here it looks like the difference could be noise, but smaller lists reveal that it isn't:

benchmarking 10000/slow/break time 5.438 ms (5.238 ms .. 5.738 ms) 0.983 R² (0.963 R² .. 0.998 R²) mean 5.424 ms (5.312 ms .. 5.695 ms) std dev 520.1 μs (177.9 μs .. 969.6 μs) variance introduced by outliers: 58% (severely inflated)

benchmarking 10000/slow/break' time 4.626 ms (4.539 ms .. 4.709 ms) 0.997 R² (0.996 R² .. 0.999 R²) mean 4.698 ms (4.644 ms .. 4.874 ms) std dev 261.5 μs (107.4 μs .. 549.8 μs) variance introduced by outliers: 32% (moderately inflated)

benchmarking 1000/slow/break time 133.2 μs (132.3 μs .. 134.4 μs) 0.999 R² (0.999 R² .. 1.000 R²) mean 133.6 μs (132.8 μs .. 134.3 μs) std dev 2.487 μs (1.959 μs .. 3.534 μs) variance introduced by outliers: 13% (moderately inflated)

benchmarking 1000/slow/break' time 87.01 μs (85.76 μs .. 88.41 μs) 0.998 R² (0.997 R² .. 0.999 R²) mean 86.12 μs (85.36 μs .. 87.38 μs) std dev 3.319 μs (2.349 μs .. 5.725 μs) variance introduced by outliers: 39% (moderately inflated)

treeowl commented 1 year ago

Please give the code to the actual benchmarks. You're not giving enough info about what you're actually measuring for other people to be able to evaluate the significance of that.

JoshMeredith commented 1 year ago

Here's the Criterion code:

main :: IO ()
main = defaultMain
  [ bgroup (show sz)
    [ bgroup pr
      [ bench fn $
         nf (f $ p sz) [0..sz] | (fn, f) <- fns] | (pr, p) <- predicates ] | sz <- listSizes]

predicateHalf :: Int -> Int -> Bool
predicateHalf n x = x == (n `div` 2)

predicateZero :: Int -> Int -> Bool
predicateZero n x = x == 0

predicateFull :: Int -> Int -> Bool
predicateFull n x = x == n

predicateSlow :: Int -> Int -> Bool
predicateSlow n x = x `elem` [0..n] && x == (n `div` 2)

listSizes :: [Int]
listSizes = [100,1000,10000,100000,1000000]

fns :: [(String, (Int -> Bool) -> [Int] -> ([Int], [Int]))]
fns =
  [ ("break", break_)
  , ("break'", break')
  ]

predicates :: [(String, Int -> Int -> Bool)]
predicates =
  [ ("half", predicateHalf)
  , ("zero", predicateZero)
  , ("full", predicateFull)
  , ("slow", predicateSlow)
  ]

break_ _ xs@[] = (xs, xs)
break_ p xs@(x:xs')
  | p x = ([], xs)
  | otherwise = let (ys, zs) = break_ p xs' in (x:ys, zs)

break' _ xs@[] = (xs, xs)
break' p xxs@(x:xs)
  | p x = ([], xxs)
  | otherwise = let !(ys, zs) = break' p xs in (x:ys, zs)

phadej commented 1 year ago

@JoshMeredith thanks for the benchmark.

My hunch is that lazy span/break are "faster" when you then lazily process a prefix (potentially a part of it) and then maybe not look at the suffix at all. Otherwise it feels that strict version is never worse.

E.g. If I change in the benchmark above to

main :: IO ()
main = defaultMain
  [ bgroup (show sz)
    [ bgroup pr
      [ env (return [0..sz]) $ \input -> bench fn $
         nf (post . f (p sz)) input | (fn, f) <- fns] | (pr, p) <- predicates ] | sz <- listSizes]

where post

post (xs, ys) = listToMaybe ys

then break' is never slower

If I just look at a small prefix of xs like

post (xs, ys) = take 20 xs

then break is faster. (I should use takeWhile to begin with in this case).

And if I look at both heads:

post (xs, ys) = (listToMaybe xs, listToMaybe ys)

then break' is again never slower.

So it seems that lazy span / break are only useful when you look partially at the prefix and may not look at the suffix (this case includes an infinite length input). But if you know that you will always look at the suffix, then strict versions are a win.

EDIT: lazy variant is also useful when you lazily consume prefix (discarding the elements as you go), and only then look at the suffix. i.e. you don't need to first find the break point. E.g. something like

go input = do
  let (xs, ys) = break p input
  forM_ xs $ \x -> ....
  go ys

I guess.

Am I missing something?

Core of the workers: for interested

-- lazy
-- prefix is produced as we go.
Main.$wbreak_
  = \ (@t_s2Fo) (ds_s2Fp :: t_s2Fo -> Bool) (xs_s2Fq :: [t_s2Fo]) ->
      case xs_s2Fq of wild_X1E {
        [] ->
          (# ghc-prim:GHC.Types.[] @t_s2Fo, ghc-prim:GHC.Types.[] @t_s2Fo #);
        : x_a1et xs'_a1eu ->
          case ds_s2Fp x_a1et of {
            False ->
              let {
                ds1_s2Cz [Dmd=LP(ML,ML)] :: ([t_s2Fo], [t_s2Fo])
                [LclId]
                ds1_s2Cz
                  = case Main.$wbreak_ @t_s2Fo ds_s2Fp xs'_a1eu of
                    { (# ww_s2H3, ww1_s2H4 #) ->
                    (ww_s2H3, ww1_s2H4)
                    } } in
              (# ghc-prim:GHC.Types.:
                   @t_s2Fo
                   x_a1et
                   (case ds1_s2Cz of { (ys_a1qW, zs_a1qX) -> ys_a1qW }),
                 case ds1_s2Cz of { (ys_a1qW, zs_a1qX) -> zs_a1qX } #);
            True -> (# ghc-prim:GHC.Types.[] @t_s2Fo, wild_X1E #)
          }
      }

-- strict
-- here you'll see a first character of prefix only after the break character is found.
Main.$wbreak'
  = \ (@t_s2Fv) (ds_s2Fw :: t_s2Fv -> Bool) (xs_s2Fx :: [t_s2Fv]) ->
      case xs_s2Fx of wild_X1E {
        [] ->
          (# ghc-prim:GHC.Types.[] @t_s2Fv, ghc-prim:GHC.Types.[] @t_s2Fv #);
        : x_a1eA xs1_a1eB ->
          case ds_s2Fw x_a1eA of {
            False ->
              case Main.$wbreak' @t_s2Fv ds_s2Fw xs1_a1eB of
              { (# ww_s2H6, ww1_s2H7 #) ->
              (# ghc-prim:GHC.Types.: @t_s2Fv x_a1eA ww_s2H6, ww1_s2H7 #)
              };
            True -> (# ghc-prim:GHC.Types.[] @t_s2Fv, wild_X1E #)
          }
      }

hasufell commented 1 year ago

It would be good to find examples of both of the above in the wild, and to provide evidence that a stricter break or span actually realizes performance gains.

I personally don't need to see a lot of investigation on this.

It's not a breaking change and there surely are cases where strictness can be beneficial.

So I find this a no-brainer to add.

We might discuss whether the strict versions should live in their own module, but I guess that ship has sailed.

treeowl commented 1 year ago

@phadej My guess is that the lazy version will be faster if most or all of the prefix is consumed before the suffix is inspected, especially if the prefix is long. Of course, in the "all" case it would really be better to "split" the list into something like Free ((,) a) [a].

nomeata commented 1 year ago

So I find this a no-brainer to add.

Quite contrary, I'd say: adding these function variants to a prominent place like Data.List incurs the cost of having to make a choice for every user reaching for span/break. Also, a teacher explaining these functions will have to explain why there are variants.

I find that a relevant regression in usability, and needs to be justified by common, realistic use cases with significant performance benefits.

(Imagine we didn't have foldl yet, only foldl', how much nicer Haskell would be. If some would argue for adding lazy foldl we would also ask for good evidence why it's worth the complications.)

hasufell commented 1 year ago

I find that a relevant regression in usability, and needs to be justified by common, realistic use cases with significant performance benefits.

I don't think it has to be significant. Anecdotal evidence and microbenchmarks have been presented and the proposed implementation is in no way as preposterous as foldl. I see reason that it's going to be useful and don't require the author to go through stackage to prove the point.

But if others want more evidence, sure.

phadej commented 1 year ago

Anecdotal evidence and microbenchmarks have been presented

TBH, I still have no idea when to pick span' and when span. With foldl' and foldl I understand the difference, with span and span' I don't. (E.g. I don't see how lazy span can introduce a space leak/usage peak, in a way foldl could).

So I agree with @nomeata.

If span' and break' functions are added I'd expect very good explanation of the difference in the docs of how to pick between variants.

tomjaguarpaw commented 1 year ago

Personally I'm against increasing the API surface area of base unless it comes with very clear criteria for deciding when to use the new versions in preference to the old versions. I think the proposed new functionality here is a prime candidate for being added to a separate, new package, tried for a few years, and then integrated into base only if it proves sufficiently popular.

hasufell commented 1 year ago

I think the proposed new functionality here is a prime candidate for being added to a separate, new package, tried for a few years, and then integrated into base only if it proves sufficiently popular.

I think an existing package will do: https://hackage.haskell.org/package/extra-1.7.12/docs/Data-List-Extra.html

How do you want to check for popularity? Stackage?

It also runs the chicken-and-egg problem similar to https://github.com/haskell/core-libraries-committee/issues/78 ...if it isn't in base, people may simply not be aware of the option, unless they already know about it.

base being a collection of the "best primitives" seems a bit strict to me, though?

tomjaguarpaw commented 1 year ago

It also runs the chicken-and-egg problem similar to https://github.com/haskell/core-libraries-committee/issues/78 ...if it isn't in base, people may simply not be aware of the option, unless they already know about it.

It's a fair point. There is an unresolved ambiguity about whether base should lead or lag common practice. Essentially I think the problem comes down to there being no shared understanding of what the meaning and purpose of base really is.

phadej commented 1 year ago

In current practice, base cannot lead common practice development. Consider discussion on changing signature of Data.List.NonEmpty.unzip - quite small and isolated change is made extremely difficult.

It's impossible to get anything right first time, but changes (which may affect existing code, i.e. not additions) in base are extremely laborious to push through.

whether base should lead or lag common practice is directly related to how much we would like (and CLC would allow us) to change already existing things in base, possibly affecting existing code out there.

hasufell commented 1 year ago

whether base should lead or lag common practice is directly related to how much we would like (and CLC would allow us) to change already existing things in base,

That's an interesting view, but I don't share it.

I have very little appetite for changing/breaking things. But I don't ever see span' or break' being subject to e.g. a different type signature. Nor do I see a lot of space for iterating on the implementation itself.

I may agree with you on cases where the types are not set in stone or where there's a lot of design space in the implementation. I don't think either is the case here.

phadej commented 1 year ago

@hasufell it's on you (the CLC) to come up with clear criteria

Should dup :: a -> (a, a) be added. Or uncurry' f (x, y) = f x y (without lazy pattern match) to base. What about ordNub (really should be in base, shouldn't it). or

pureIf :: (Alternative m) => Bool -> a -> m a
pureIf b a = if b then pure a else empty

(though someone might call that guarded).

These are examples of small combinators which I needed in last two years. I don't know why they are not in base. Would it be easy to add them to base?

treeowl commented 1 year ago

@phadej It seems more practical to use the ordNub, ordNubOn, etc., in containers, which use appropriate data structures rather than lists.

hasufell commented 1 year ago

@hasufell it's on you (the CLC) to come up with clear criteria

That's a fair proposal that would allow potential base contributors to better assess whether to invest time in certain ideas.

However, even if we manage to agree on a baseline (which is non-binding anyway), the next CLC might simply revert it (and they should be able to).

JoshMeredith commented 1 year ago

I think the proposed new functionality here is a prime candidate for being added to a separate, new package,

Keep in mind that part of this proposal is to enable using this functionality within GHC itself, so we do need it to be in a boot library.

The alternative is to silo it away in some GHC.Utils. module, since GHC.List is also a part of base. In my opinion, this is considerably worse than including it in base for consistency.

tomjaguarpaw commented 1 year ago

The alternative is to silo it away in some GHC.Utils. module, since GHC.List is also a part of base. In my opinion, this is considerably worse than including it in base for consistency.

Another alternative is to include it in a GHC-specific boot package (perhaps a new one) and to expose it from base if and when it proves itself.

Bodigrim commented 1 year ago

TBH, I still have no idea when to pick span' and when span. With foldl' and foldl I understand the difference, with span and span' I don't. (E.g. I don't see how lazy span can introduce a space leak/usage peak, in a way foldl could).

The closest analogy is not foldl vs foldl', but catamorphisms foldr vs foldr'. One can implement span via foldr:

span p = foldr go ([], [])
  where
    go x (pref, suff)
      | p x = (x : pref, suff)
      | otherwise = ([], x : pref ++ suff)

and span' is the same but with foldr'.

A more efficient span, equivalent to the version in base, is a paramorphism, not a catamorphism:

para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = go
  where
    go :: [a] -> b
    go [] = z
    go (x : xs) = f x xs (go xs)

span :: (a -> Bool) -> [a] -> ([a], [a])
span p = para (\x xs -> if p x then first (x :) else const ([], x : xs)) ([], [])

And span' would be the same but with para', which forces go xs before calling f x xs (go xs).

Bodigrim commented 1 year ago

When would we prefer para' over para? Pretty much every time when f x xs (go xs) is likely to force its the last argument. In our implementation of span the last argument is forced only when p x. Thus if p x is likely to succeed it's better to use strict span' than span. And vice versa for break: if p x in likely to succeed use lazy break, otherwise break'.

phadej commented 1 year ago

@JoshMeredith

Keep in mind that part of this proposal is to enable using this functionality within GHC itself, so we do need it to be in a boot library.

@tomjaguarpaw

Another alternative is to include it in a GHC-specific boot package (perhaps a new one) and to expose it from base if and when it proves itself.

ghc library has plenty of utils already in it. See https://hackage.haskell.org/package/ghc-9.4.4/docs/GHC-Utils-Misc.html

tomjaguarpaw commented 1 year ago

ghc library has plenty of utils already in it. See https://hackage.haskell.org/package/ghc-9.4.4/docs/GHC-Utils-Misc.html

Yes, I think that's an unfortunate historical accident. It leads to base doing an awkward double (or even triple) duty. I'm broadly sympathetic to @Ericson2314's views on the matter (although there are a lot of devilish details to be worked out).

angerman commented 1 year ago

My understanding is: this functionality is needed for work on and in ghc. And the question is merely whether or not those functions seem useful enough to expose them through base, or just not expose them and implement them somewhere private to GHC. While I tend to value consistency and predictability (when there is a lazy f, I often find a strict f'), the discussion that happened here leads me to believe the latter (adding it private to GHC only) might be the better choice.

I will however excuse myself from voting on this, as Josh is indirectly reporting.

parsonsmatt commented 1 year ago

This can be made more strict by introducing a bang pattern on the let-binding:

Huh, that's surprising - I was under the impression that a ! forced it's argument to WHNF, but a constructor match like let (a, b) = ... is already in WHNF, and should not increase strictness 🤔 Or am I confusing case and let here? ie is the proposed equivalent to

break _ xs@[]           =  (xs, xs)
break p xs@(x:xs')
           | p x        =  ([],xs)
           | otherwise  =  let (ys,zs) = break p xs' in (x:ys,zs)

break' _ xs@[]           =  (xs, xs)
break' p xs@(x:xs')
            | p x        =  ([],xs)
            | otherwise  =  case break' p xs' of
                                 (ys,zs) -> (x:ys,zs)

I'm curious to see how/if these benchmarks change if the given functions aren't defined in the same module. IIRC, the effects systems benchmarks all suffered from this - GHC's inliner was able to optimize same-module code super well, and cross-module code suffered.

I would definitely want to see clear examples on when you'd want the strict versions vs the lazy ones. Especially if there's some case study - maybe some GHC code that uses break' heavily can have a substatial speedup. If this is for use in GHC, then it must be for improving performance, and there likewise must be some upstream justification beyond a microbenchmark.

tek commented 1 year ago

@parsonsmatt A strict let binding is not a bang pattern (see https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/strict.html#strict-bindings). In the case of let !a in b, a is forced unconditionally, while a bang pattern (let (!a, b) in c) only forces a when b is evaluated in c.

JoshMeredith commented 1 year ago

Huh, that's surprising - I was under the impression that a ! forced it's argument to WHNF

Sylvain's comment in the GHC MR explains this well (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9868#note_478265).

I'm curious to see how/if these benchmarks change if the given functions aren't defined in the same module

When universally changing the break implementation to the strict version, and rebuilding the compiler/boot libraries, we still see a significant improvement on allocations in the ticky profile of building the startup code program with ghc (i.e. echo "main = return ()" >> hi.hs; ghc hi.hs +RTS -rprofile.ticky). We see from the STG of the benchmark program that neither version is inlined, despite being in the same module.

doyougnu commented 1 year ago

Quite contrary, I'd say: adding these function variants to a prominent place like Data.List incurs the cost of having to make a choice for every user reaching for span/break.

In general I think this is a good point, but I'd like to point out that this is already an informal standard in basically all of containers and most container-like packages, i.e., offering a lazy and a strict api version where the strict is appended with a ' for example, iterate vs iterate'. I would argue that this notational convention is as common as the appended _ to mean to throw away the result of a computation. So to tie this back, adding these functions are in line with the informal api standard and most users are already forced to make these kinds of decisions for prominent packages in the Haskell ecosystem. So in my view yes this point is correct: "adding these function variants...incurs the cost of having to make a choice", but that choice is already foisted onto users in core libraries and that choice should be very very well known to users since it is a very common choice in the haskell (or hackage?) ecosystem. Heck the convention of a ' to mean more strict version is even in the Learn You a Haskell chapter on Data.List.

Bodigrim commented 1 year ago

My hunch is that lazy span/break are "faster" when you then lazily process a prefix (potentially a part of it) and then maybe not look at the suffix at all. Otherwise it feels that strict version is never worse.

That's my hunch as well, and indeed in such case one should use takeWhile instead of break.

Actually, rather than introducing new functions, I would rather amend existing implementations of break / span on the ground that break' / span' are better defaults under normal circumstances and there is no semantical difference between them. I'm interested to hear from other CLC members.

nomeata commented 1 year ago

that choice should be very very well known to users since it is a very common choice in the haskell (or hackage?) ecosystem.

But it's a new choice for every new function. Just because someone has learned about the convention and maybe how to choose between foldl and foldl' doesn't mean they will have an easy choosing between break and break'. In fact, it seems that not even this crowd here has an easy time identifying rules when exactly one should use one or the other. So it seems to be a hard choice, and therefore one that shouldn't be put on users easily.

treeowl commented 1 year ago

@Bodigrim, I very strongly oppose changing the semantics of these very old functions.

Unlike foldl @[], they are not footguns. Rather, they're exactly what one would expect in a lazy language like Haskell.
Changing them may well cause run-time failures and/or serious performance degradation in all sorts of software that's been working for decades.

Bodigrim commented 1 year ago

I'm not sure about (2). It seems to me that break' has exactly the same degree of observable laziness as break.

mixphix commented 1 year ago

You're right that we should think hard before changing the semantics of functions defined in the Haskell Report, @treeowl. It's about the only thing that's sacred. These functions (if they are worth adding), ought to have dedicated names.

But come on. It's not as if the entire ecosystem is getting hot-swapped out from underneath the software running on some server. And what server would be running so many spans on lists that are not worth fully traversing?

treeowl commented 1 year ago

@Bodigrim, the semantics are absolutely different.

break (>2) [1, 2, undefined] = (1 : 2 : _|_, _|_)
break' (>2) [1, 2, undefined] = _|_

@mixphix, while it's no longer as popular as it once was, there are still loads of Haskell programs using lazy I/O. It's perfectly legitimate to apply span to the entire input stream and expect output to start happening in a streaming fashion before the predicate pops False.

chshersh commented 1 year ago

This is a tough one to make a decision 😮‍💨

As a CLC member, I would have a much easier time voting on proposals if we had:

A strict definition of base goals and its mission statement (e.g. whether it should lag behind or pioneer)
A set of rules telling whether a breaking change is acceptable or not
A documentation of base idioms and best practices (does ' mean strict? is GHC.* namespace non-public? etc.)

Until we have this, every single decision will be exhausting and time-consuming as different people have different views on base on how it should move, and, most importantly, people's views change with time.

I would like to summarise, why making a decision on this proposal is difficult (for me personally):

As @treeowl showed, the behaviour of break' is different. So changing the definition of break and span is a breaking change if we want to go that route. If anyone in the world relies on this undocumented behaviour, I could imagine it would be quite difficult for them to investigate the problem.
```
λ: break (> 2) [1, 2, undefined]
([1,2*** Exception: Prelude.undefined
λ: break' (> 2) [1, 2, undefined]
*** Exception: Prelude.undefined
```
At the same time, the difference between break and break' is subtle, so making a decision between the two functions as a Haskell developer is challenging. If we go the route of introducing new functions, I would suggest making lucid documentation for all four functions a hard requirement for acceptance (I can review the documentation changes and provide my advice on them using my experience in education).
At the same time, I can't imagine anyone wanting the default behaviour of break. The new definition of break' is strictly better (pun intended). I would love base to be the place that encourages best practices and provides sane defaults out of the box. But base accumulated 30-year-old baggage and it's not going away.

The existing definition of break is already not streaming and lazy enough. If anyone ever wanted streaming and maximally lazy implementation of break, they should write something like this:

data BreakList a = Cons a (BreakList a) | Finish [a]
    deriving (Show)

breakList :: (a -> Bool) -> [a] -> BreakList a
breakList _ [] = Finish []
breakList p xs@(x:xs')
    | p x = Finish xs
    | otherwise = Cons x (breakList p xs')

Inspired by break from streaming

In the context of all the above, I'm +1 on adding the new break' / span' functions with the hard requirement that documentation for all four functions (break, span, span', break') will be expanded with:

More examples
Explanations of the laziness and performance semantics
Clear criteria on when to choose between break / break' and span / span')

P.S. Improving the documentation of existing functions break and span would be a great idea regardless, but, oh, well, someone needs to write it in the end 😞

hasufell commented 1 year ago

As a CLC member, I would have a much easier time voting on proposals if we had:

A strict definition of base goals and its mission statement (e.g. whether it should lag behind or pioneer)

A set of rules telling whether a breaking change is acceptable or not

A documentation of base idioms and best practices (does ' mean strict? is GHC.* namespace non-public? etc.)

Until we have this, every single decision will be exhausting and time-consuming as different people have different views on base on how it should move, and, most importantly, people's views change with time.

I somewhat agree, but as you know, my recent attempt at finding some very general alignment by simply understanding the opinion of all CLC individuals in public (through manifestos) didn't go very far.

However, CLC members are free to organize themselves and discuss proposals privately in any fashion they please, so my suggestion is that people should just ping each other to get a better understanding of those topics.

treeowl commented 1 year ago

@chshersh I think something has gone very wrong if not one but two members of the CLC think the laziness of span and break is subtle, unexpected, or obscure. Additionally, you claim that stricter semantics are strictly better here, without justifying that claim.

Bodigrim commented 1 year ago

@treeowl your last comment is not constructive. We are here to nurture a common understanding, not for dismissive remarks. You are very welcome to provide more examples when this difference in laziness matters to support your point of view.

treeowl commented 1 year ago

@Bodigrim I apologize for my harsh tone. But I think it's important that the committee recognize that Haskell was originally intended specifically as a lazy functional language, and that functions like break and span were designed with lazy intent. Are they the best way to accomplish this in practice? Probably not—a proper streaming approach is less error prone and likely more efficient. But I wouldn't want to bet on there not being a lot of code actually relying on the laziness.

tomjaguarpaw commented 1 year ago

-1

In my opinion this has not been strongly enough motivated for inclusion in base. If this is needed for GHC then it should be included in a GHC-specific boot, not-intentionally-user-facing package (perhaps a new one). We can consider exposing it from base if and when it proves itself.

chshersh commented 1 year ago

@chshersh I think something has gone very wrong if not one but two members of the CLC think the laziness of span and break is subtle, unexpected, or obscure

@treeowl I didn't say that the laziness of span and break is subtle, I said that the difference between break and break' w.r.t. to laziness semantics is subtle. If it's obvious to you then good for you. But I would appreciate it if you could address something I said and not something I haven't said. This would make the dialogue more efficient 😌

Although the behaviour of break makes sense to me, similarly to @parsonsmatt, the behaviour of the let (x, y) = binding wasn't obvious at all. Moreover, the result of break around edge cases is not documented currently in any way. Without further guarantees on the established API, I can easily imagine GHC changing its optimizer, and suddenly break starts behaving differently due to unexpected light of events.

Additionally, you claim that stricter semantics are strictly better here, without justifying that claim.

Nobody in this thread provided a real-world example where people actually want the existing behaviour of break and may even prefer it to break'. It may be faster when you need to traverse only the first half, but at this point, it's better to use takeWhile. And since break' provides fewer allocations (and, thus, faster because of that), I see break' as a strict improvement on top of break.

I hope I was able to clarify my words, so feel free to ask any follow-up questions if you think I haven't explained my understanding in detail 😌

tomjaguarpaw commented 1 year ago

Nobody in this thread provided a real-world example where people actually want the existing behaviour of break and may even prefer it to break'

If you want to traverse both the first half and the second half in a streaming fashion then you will want break. break' consumes stack proportional to the length of the matching prefix before yielding any result. In particular fst (break' (< 0) [1..]) == _|_ but fst (break (< 0) [1..]) == [1..].

hasufell commented 1 year ago

Nobody in this thread provided a real-world example where people actually want the existing behaviour of break and may even prefer it to break'

If you want to traverse both the first half and the second half in a streaming fashion then you will want break. break' consumes stack proportional to the length of the matching prefix before yielding any result. In particular fst (break' (< 0) [1..]) == _|_ but fst (break (< 0) [1..]) == [1..].

Can anyone provide benchmarks for cases that rely on list fusion?

treeowl commented 1 year ago

@hasufell What does list fusion have to do with this?

hasufell commented 1 year ago

@hasufell What does list fusion have to do with this?

I don't know, that's why I'm asking.

treeowl commented 1 year ago

@hasufell, list fusion has nothing to do with it. Neither implementation is a "good consumer" and neither is a "good producer".

Bodigrim commented 1 year ago

P.S. Improving the documentation of existing functions break and span would be a great idea regardless, but, oh, well, someone needs to write it in the end 😞

Inspired by this, I took a stab at break / span in the course of https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10081/diffs. Everyone is welcome to review.

Bodigrim commented 1 year ago

There are surprisingly many libraries definining things named break' / span', including, for instance, text: https://hackage-search.serokell.io/?q=%5B%5E%27%60.%5Cw%5D%28break%7Cspan%29%27. If the proposal is accepted, they are likely to suffer from name clashes. I would like to see a full-scale impact assessment running clc-stackage to determine exact impact.

Technically this is non-breaking and PVP recommends everyone to use qualified imports or explicit import lists only, but as everyone knows the practice is quite different. We really need -Wcompat-unqualified-imports in -Wall to alleviate this pain, see !21791.

I'm generally sympathetic to the proposal, because in many cases those tuple thunks make no sense, but kinda on the fence because of social aspects.

doyougnu commented 1 year ago

In fact, it seems that not even this crowd here has an easy time identifying rules when exactly one should use one or the other. So it seems to be a hard choice, and therefore one that shouldn't be put on users easily.

Hmm, I agree with this point. You've flipped me @nomeata :)

Then the thing to add is guidance on the appropriate situation to use each function. Something like the datasheets for datasets effort in the AI literature. The basic idea is that the electronics industry provides a datasheet for each component that is produced. This datasheet describes the component and provides guidance on use. We can also do something like this, although not issue something as extensive as an entire datasheet. But even a few sentences would be a step in the right direction.

haskell / core-libraries-committee