haskell / core-libraries-committee

95 stars 16 forks source link

Allow `lookup` to fuse #175

Closed MorrowM closed 1 year ago

MorrowM commented 1 year ago

Currently, the lookup function from the prelude does not have any list fusion facilities. I propose we add a RULE allowing it to fuse with build:

{-# RULES
"lookup/build" forall x (g :: forall b. ((k, a) -> b -> b) -> b -> b).
  lookup x (build g) =
    g (\(k, v) r -> if x == k then Just v else r) Nothing
  #-}

The obvious case where this is useful is when you have some function generating this list of tuples using build.

One less-obvious case where this would be useful (which happens to be the case I found this issue in) is as follows. Suppose you have a small sum type Foo and you want to write an invertible function Foo -> Int. Instead of implementing this by hand you could write a list of input output pairs and then write both functions as lookups on this list. Since GHC desugars lists using build you can make the list disappear from the generated Core.

data Foo = A | B | C | D
  deriving (Eq)

fooInts :: [(Foo, Int)]
fooInts =
  [ (A, 1)
  , (B, 7)
  , (C, 9)
  , (D, 11)
  ]

foo2Int :: Foo -> Int
foo2Int foo = fromMaybe (-1) $ lookup foo (inline fooInts)

int2Foo :: Int -> Maybe Foo
int2Foo int = lookup int $ map swap (inline fooInts)

Generated Core (using GHC 9.6.2):

foo2Int :: Foo -> Int
foo2Int
  = \ (foo :: Foo) ->
      case foo of {
        A -> foo2Int4;
        B -> foo2Int3;
        C -> foo2Int2;
        D -> foo2Int1
      }

int2Foo :: Int -> Maybe Foo
int2Foo
  = \ (int :: Int) ->
      case int of { I# x ->
      case x of {
        __DEFAULT -> Nothing;
        1# -> lvl;
        7# -> lvl1;
        9# -> lvl2;
        11# -> lvl3
      }
      }

Of course, this nice Core output relies on the proposed fusion RULE.

Playground link

Bodigrim commented 1 year ago

I imagine that instead of a new rule, one can just express lookup via foldr and benefit from existing rewrite rules.

lookup a = foldr (\(a', b') acc -> if a == a' then Just b' else acc) Nothing

Could someone check Core please?

MorrowM commented 1 year ago

I imagine that instead of a new rule, one can just express lookup via foldr and benefit from existing rewrite rules.

I believe this isn't usually done in base since a user compiling with -O0 will end up with a slower program. For example main = print $ lookup 100_000_000 (zip [0 ..] [0 ..]) runs about 33% slower on my machine if I use your lookup. One other option is to have a rewrite rule that rewrites the recursive lookup into a foldr implementation, but that seems less common, looking at existing implementations for things like GHC.List.and and GHC.List.elem.

MorrowM commented 1 year ago

I've confirmed that the RULE

{-# RULES
"lookup/foldr" forall x.
  lookup x =
    foldr (\(k, v) r -> if x == k then Just v else r) Nothing
  #-}

produces the nice Core for the linked example.

tomjaguarpaw commented 1 year ago

This is still O(n) presumably? If so could someone elaborate the point of this? Why bother to pick up a constant factor improvement in something that is asymptotically suboptimal?

Bodigrim commented 1 year ago

I believe this isn't usually done in base since a user compiling with -O0 will end up with a slower program.

I do not recall much discussion on performance of base under -O0 virtually ever, and I don't think it's relevant as long as asymptotics does not change.

Why bother to pick up a constant factor improvement in something that is asymptotically suboptimal?

It's a constant factor in time complexity, but potentially an asymptotic improvement for space complexity, from O(n) to O(1), if everything fused away.

treeowl commented 1 year ago

Not space complexity, but allocation. That mostly matters for concurrent programming, but it matters a lot. Whether the operation is asymptomatically good depends on how it's used. Remember that the fusion rule will generally only fire if the list is used only once. It likely represents checking for the result of some iterative process, which might be exactly what's needed.

MorrowM commented 1 year ago

This is still O(n) presumably? If so could someone elaborate the point of this? Why bother to pick up a constant factor improvement in something that is asymptotically suboptimal?

If your association list is being produced by a good producer that uses build and doesn't need to be reused then that's exactly when you'd want to use a fusion-friendly lookup. It's better than using a Map if you don't need repeated lookups.

I do not recall much discussion on performance of base under -O0 virtually ever, and I don't think it's relevant as long as asymptotics does not change.

I'm simply going based off of prior art. Many functions in GHC.List and GHC.Base are written such that they use recursion and get rewritten to fusion-friendly versions. I'm not sure if I read the bit about -O0 somewhere or if that's a justification I came up with some time ago, but I don't think it'd hurt to give -O0 users a better dev experience. (I would be interested to hear if this is indeed the reason for all the rewrite rules in GHC.List if anyone happens to know, they seem to predate the GHC Gitlab history.)

mixphix commented 1 year ago

Though I still struggle to wrap my head around fusion on the best of days, this seems like a straightforward improvement. I prefer the foldr rule to the one that explicitly mentions build.

chshersh commented 1 year ago

I imagine that instead of a new rule, one can just express lookup via foldr and benefit from existing rewrite rules.

lookup a = foldr (\(a', b') acc -> if a == a' then Just b' else acc) Nothing

Could someone check Core please?

I would like to reiterate @Bodigrim's question here. Can we just change the implementation of lookup to use foldr?

I find the fusion framework unreliable in general, so unless there's a strong reason, I'd go for changing the implementation explicitly.

MorrowM commented 1 year ago

I find the fusion framework unreliable in general, so unless there's a strong reason, I'd go for changing the implementation explicitly.

Switching to a foldr can only help if fusion happens, and may hurt in some ways if it doesn't get inlined/doesn't get inlined soon enough. Adding a RULE seems to me to be reasonable (this is base, after all, so shouldn't we strive to make it as efficient as possible?). In any case, the reliability of fusion shouldn't make a difference, since this only helps if fusion happens, and can't hurt if it doesn't (it is a fusion rule, this is what fusion is).

Even if we do want to stick to using foldr over using RULES, then we should do it for all the other functions that do this, no?

Speaking of, I do want to ask @treeowl: do you know why so many functions in GHC.List do this (ex: and, or, elem, any)? I saw you in the history of this GHC wiki page, which is why I ask.

Bodigrim commented 1 year ago

Rules are one of the most brittle features of GHC. It takes lots of effort (and trial-and-error) to maintain current system, and each new rule makes me shiver. Simply rewriting lookup via foldr and automatically benefiting from all existing and future rules involving foldr is a much more appealing option to me.

I don't think I ever saw people seriously considering performance under -O0. Might be worth attention if it was asymptotically worse or like 10x worse, but 33% slower evaluation in GHCi is next to nothing in my books. The current proliferation of manual recursion in Data.List is most likely a historical accident.

phadej commented 1 year ago

is most likely a historical accident.

The implementation is specified by the report: https://www.haskell.org/onlinereport/haskell2010/haskellch9.html

MorrowM commented 1 year ago

@Bodigrim Do you intend to remove the rewrite rules from and, any, elem, etc?

MorrowM commented 1 year ago

To be clear, I'm fine with just writing it as foldr, I just think it would be best to not make base even more inconsistent than it already is.

Bodigrim commented 1 year ago

@Bodigrim Do you intend to remove the rewrite rules from and, any, elem, etc?

It's up to you as a proposer to decide what you wish to do. My personal take is that we teach new joiners to avoid manual recursion, and it would be nice to show-case this virtue in base.

FWIW GHC.List.{and,any,elem} are of little significance: they are not re-exported from Data.List or Prelude, which prefer members of Foldable instead. Still would be nice to rectify them.

I recently worked on a package for infinite lists and was able to express pretty much everything via foldr, keeping fusion at the same scale as in base: https://github.com/Bodigrim/infinite-list/blob/master/src/Data/List/Infinite.hs

MorrowM commented 1 year ago

Alright then. Final proposal is to replace the current, manually recursive implementation of lookup with

lookup a = foldr (\(a', b') acc -> if a == a' then Just b' else acc) Nothing

@Bodigrim I'd like to trigger a vote.

Bodigrim commented 1 year ago

@MorrowM could you please prepare a GHC MR before a vote?

simonpj commented 1 year ago

Final proposal is to replace the current, manually recursive implementation of lookup with lookup a = foldr ((a', b') acc -> if a == a' then Just b' else acc) Nothing

Can I urge that this code is accompanied with a Note explaining the thinking behind it? There has been quite a lot of discussion on this thread -- the Note could distil the learnings from those insights.

The danger is ending up with a cryptic one-liner, and our future selves in 10 years time change it back

mixphix commented 1 year ago

Perhaps we can simply comment out the existing implementation without entirely deleting it, as a sanity check for those unlucky souls still changing Data.List in ten years? It is defined in the Report after all.

nomeata commented 1 year ago

There are CPP pragmas for using the report definition elsewhere, so this can be used here too.

MorrowM commented 1 year ago

Hmm, this makes me more hesitant to simply go with the "implement it as foldr" route. If we look at the full source for GHC.List.and we see

and                     :: [Bool] -> Bool
#if defined(USE_REPORT_PRELUDE)
and                     =  foldr (&&) True
#else
and []          =  True
and (x:xs)      =  x && and xs
{-# NOINLINE [1] and #-}

{-# RULES
"and/build"     forall (g::forall b.(Bool->b->b)->b->b) .
                and (build g) = g (&&) True
 #-}
#endif

It seems base goes out of its way to explicitly not use foldr as the standard implementation even if it means using CPP and a RULE.

The current consensus in this thread seems to be to do

lookup                  :: (Eq a) => a -> [(a,b)] -> Maybe b
#if defined(USE_REPORT_PRELUDE)
lookup _key []          =  Nothing
lookup  key ((x,y):xys)
    | key == x           =  Just y
    | otherwise         =  lookup key xys
#else
lookup key = foldr (\(x,y) acc -> if key == x then Just y else acc)
#endif

which is the exact opposite of what and does! This makes me a bit hesitant.

Bodigrim commented 1 year ago

It seems base goes out of its way to explicitly not use foldr as the standard implementation even if it means using CPP and a RULE.

This part of base is extremely old and its first iterations predate Haskell 98 standard. I suspect that the historical process was vice versa: the earliest definition of and was derived from HBC stdlib:

module P_List_and  where
and     :: [Bool] -> Bool
and []      =  True
and (x:xs)  =  x && and xs

Later, once Haskell 98 was out, a new, report-compliant definition was added, but never enabled by default.

Bottom line is that I'm not very sympathetic to the argument "our ancestors did it this way, let's cargo cult it".

It is defined in the Report after all.

Strictly speaking, it's only Haskell 98 Report which defines stdlib implementation. IIANM Haskell 2010 Report describes semantics, but does not prescribe any specific implementation.

MorrowM commented 1 year ago

Alright then, that seems to put any fears to rest. Should the #if defined(USE_REPORT_PRELUDE) part of the lookup definition be included? Or is that unnecessary? I do intend to put in a separate Note explaining this whole ordeal either way.

Bodigrim commented 1 year ago

I personally do not see much point in #if defined(USE_REPORT_PRELUDE), we are too far away past Report by now.

MorrowM commented 1 year ago

@Bodigrim GHC MR is up: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10715

sgraf812 commented 1 year ago

I imagine that instead of a new rule, one can just express lookup via foldr and benefit from existing rewrite rules.

lookup a = foldr (\(a', b') acc -> if a == a' then Just b' else acc) Nothing

Could someone check Core please?

All results with optimisations enabled.

lookup_old                  :: (Eq a) => a -> [(a,b)] -> Maybe b
lookup_old _key []          =  Nothing
lookup_old  key ((x,y):xys)
    | key == x           =  Just y
    | otherwise         =  lookup_old key xys
{-# OPAQUE lookup_old #-}

lookup_new :: Eq a => a -> [(a,b)] -> Maybe b
lookup_new key = foldr (\(x,y) acc -> if key == x then Just y else acc) Nothing
{-# OPAQUE lookup_new #-}

main = print $ lookup_old 10_000_000 (zip [0 ..] [0 ..])
-- main = print $ lookup_new 10_000_000 (zip [0 ..] [0 ..])

Perf (esp. allocs) seems to be the same:

Just 10000000
   1,680,051,448 bytes allocated in the heap
          61,600 bytes copied during GC
          44,328 bytes maximum residency (2 sample(s))
          29,400 bytes maximum slop
               5 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       405 colls,     0 par    0.003s   0.003s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.394s  (  0.520s elapsed)
  GC      time    0.003s  (  0.003s elapsed)
  EXIT    time    0.000s  (  0.007s elapsed)
  Total   time    0.397s  (  0.530s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,264,646,419 bytes per MUT second

  Productivity  99.2% of total user, 98.0% of total elapsed

The Core is a bit different, but only because foldr introduces a join point for the loop (which has no overhead whatsoever):

lookup_new
  = \ (@a_a16M)
      (@b_a16N)
      ($dEq_a16O :: Eq a_a16M)
      (key_aHg [OS=OneShot] :: a_a16M)
      (eta_B0 [OS=OneShot] :: [(a_a16M, b_a16N)]) ->
      joinrec {
        go1_a2fZ [Occ=LoopBreaker, Dmd=SCS(L)]
          :: [(a_a16M, b_a16N)] -> Maybe b_a16N
        [LclId[JoinId(1)(Just [!])], Arity=1, Str=<1L>, Unf=OtherCon []]
        go1_a2fZ (ds_a2g0 :: [(a_a16M, b_a16N)])
          = case ds_a2g0 of {
              [] -> GHC.Maybe.Nothing @b_a16N;
              : y_a2g3 ys_a2g4 ->
                case y_a2g3 of { (x_aSO, y1_aSP) ->
                case == @a_a16M $dEq_a16O key_aHg x_aSO of {
                  False -> jump go1_a2fZ ys_a2g4;
                  True -> GHC.Maybe.Just @b_a16N y1_aSP
                }
                }
            }; } in
      jump go1_a2fZ eta_B0

This is the original lookup_old recurses directly (which makes no measurable operational difference):

lookup_old
  = \ (@a_a176)
      (@b_a177)
      ($dEq_a178 :: Eq a_a176)
      (_key_atq :: a_a176)
      (ds_d1m9 :: [(a_a176, b_a177)]) ->
      case ds_d1m9 of {
        [] -> GHC.Maybe.Nothing @b_a177;
        : ds1_d1mi xys_atu ->
          case ds1_d1mi of { (x_ats, y_att) ->
          case == @a_a176 $dEq_a178 _key_atq x_ats of {
            False -> lookup_old @a_a176 @b_a177 $dEq_a178 _key_atq xys_atu;
            True -> GHC.Maybe.Just @b_a177 y_att
          }
          }
      }
Bodigrim commented 1 year ago

After discussion with @simonpj and @sgraf812, we decided that it's better to go with the original plan of a single rewrite rule

{-# RULES
"lookup/build" forall x (g :: forall b. ((k, a) -> b -> b) -> b -> b).
  lookup x (build g) = g (\(k, v) r -> if x == k then Just v else r) Nothing
#-}

Dear CLC members, let's vote on the proposal to add a rewrite rule for Data.List.lookup, allowing it to fuse with good producers. The MR, tests and technical discussion can be found at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10715.

@tomjaguarpaw @chshersh @mixphix @hasufell @angerman @parsonsmatt


+1 from me, this is a nice improvement, reducing allocations in certain scenarios.

hasufell commented 1 year ago

+1

tomjaguarpaw commented 1 year ago

+1


I still don't understand why people who want performance are using association lists though ...

mixphix commented 1 year ago

+1

chshersh commented 1 year ago

+1 to the proposal and +1 to the @tomjaguarpaw's comment above

Bodigrim commented 1 year ago

Thanks all, 5 votes are enough to approve.