haskellfoundation / hs-opt-handbook.github.io

The Haskell Optimization Handbook
https://haskell.foundation/hs-opt-handbook.github.io/
Creative Commons Attribution 4.0 International
169 stars 12 forks source link

Lambda Lifting Chapter #63

Open doyougnu opened 1 year ago

doyougnu commented 1 year ago

Tie this into the SAT chapter: https://github.com/input-output-hk/hs-opt-handbook.github.io/issues/62

and have a discussion on use cases for the function being optimized. That is, answer when one might want to keep arguments on the stack with a lambda lift, and when one might want to SAT to avoid excess stack allocs.

Bodigrim commented 1 year ago

Looking at https://input-output-hk.github.io/hs-opt-handbook.github.io/src/Optimizations/GHC_opt/lambda_lifting.html, I don't quite agree with the summary and heuristics suggested.

This new program will be much faster because f becomes essentially non-allocating.

What is supposed to be the program here? Just a single call to f? It will take roughly the same amount of time.

Lambda lifting trades heap for stack and is therefore effective for tight, closed, hot loops where fetching from the heap would be slow.

Both heap and stack are in memory and often in CPU cache, so there is little inherent difference in fetching time. That's not the point of lambda lifting.

The original program allocates one (expensive) closure for g per call of f. The modified program allocates a (cheap) additional Int argument, but per each call of g, not f. So the heuristic is: if there are few outer calls of f a n but many inner calls of g n (which happens when n is large) use the original program, because it will allocate some closures in the outer loop, but save on allocations in the inner loop. If however there are many outer calls of f a n with relatively small n so that there are few inner calls of g n, then lambda lifting proves fruitful.

The simplest motivational example for lambda lifting is

map f = go 
  where 
     go [] = [] 
     go (x : xs) = f x : go xs

vs.

map f [] = []
map f (x : xs) = f x : map f xs

The first form is beneficial if there are few calls of map with relatively long lists, and the second is better when there are many calls of map with short lists.

FWIW the advent of join points tilted scales further in the direction of static argument transformation, letting GHC to do lambda lifting on its own. For instance, see discussion at https://github.com/haskell/bytestring/pull/273#issuecomment-688467824.

doyougnu commented 1 year ago

Hi Bodigrim, thank you for the review.

The original program allocates one (expensive) closure for g per call of f. The modified program allocates a (cheap) additional Int argument, but per each call of g, not f. So the heuristic is: if there are few outer calls of f a n but many inner calls of g n (which happens when n is large) use the original program, because it will allocate some closures in the outer loop, but save on allocations in the inner loop. If however there are many outer calls of f a n with relatively small n so that there are few inner calls of g n , then lambda lifting proves fruitful.

Yes I think this is much better guidance and I will add this language to the section.

The simplest motivational example for lambda lifting is ...

Also great, this example should be very straightforward for the audience. I'll adapt the chapter to use this example. My motivation with the previous example was just to have the simplest program to show the transformation. But I do agree that the context of use for the function is an important and missing part.

Both heap and stack are in memory and often in CPU cache, so there is little inherent difference in fetching time. That's not the point of lambda lifting.

Yes, this is not the point of lambda lifting per se but I still think it is an important point to make, because learning to read haskell with an performance eye is a skill in and of itself, and teaching that skill is one of the goals of the book. So my motivation here is not to say "GHC does this optimization that you should be aware of and here is how that works" but instead to say "Here is an optimization that GHC does, it may or may not fire, but you could consider performing the optimization manually in special cases".

FWIW the advent of join points tilted scales further in the direction of static argument transformation, letting GHC to do lambda lifting on its own. For instance, see discussion at https://github.com/haskell/bytestring/pull/273#issuecomment-688467824.

Perfect! Join points is on the TODO stack so this will be a nice follow up chapter. I was thinking of doing SAT next because SAT is not enabled by default, is situational and is an effective optimization technique.

doyougnu commented 1 year ago

Also I would like to add your name to the list of contributors as a reviewer. Is that ok with you?

Bodigrim commented 1 year ago

Also I would like to add your name to the list of contributors as a reviewer. Is that ok with you?

Sure, thanks.

doyougnu commented 1 year ago

I'll adapt the chapter to use this example. Also I would like to add your name to the list of contributors as a reviewer

These are done.