Open alt-romes opened 1 year ago
I actually experimented with this when writing the weigh
chapter but I couldn't find any significant difference with an extremely small program. But I do think this effect is observable, we just need to find the right program in the wild, perhaps in pandoc or xmonad.
You could experiment with GHC Core's Expr
type: what happens if you move the Var
constructor to the last position? :)
I'd love to see those benchmarks heh
sounds like a good case study
I had a thought recently about something that affects runtime performance, but possibly in such a way so minimal that it might not even be worth adding to the book.
Nonetheless...
GHC tags pointers to evaluated heap-allocated data types with their tag. So, for example, pointers to heap-allocated Maybes will be tagged with
+1
if the allocated-value isNothing
and with+2
if it isJust ...
, (and with+0
if it is a thunk, in which case we evaluate the thunk which will return a tagged pointer (with either +1 or +2)).Then, to pattern match on
x
in the body off
, we don't need to dereference the pointer to the heap and check the constructor info, for we can simply (when x is evaluated) look at its tag. The above example compiles to the following cmm code withghc -dno-typeable-binds -fforce-recomp -ddump-opt-cmm -ddump-to-file X.hs
:The question is what happens when we have more than 7 constructors (the tag can only be in the last 3 bits of the pointer, and 0 means unevaluated data type). In the case of a datatype with more than 7 constructors, a tag of 7 means that the constructor index is 7 or higher, and therefore we have to dereference the pointer and look at the constructor info. The other tags from 0-6 retain their meaning.
Therefore, the most common constructors of a datatype with >7 constructors are better off being in the first 6 constructors, and the least common should come later. So if you have a datatype of which 80% of the times is constructed with some
Con1
, it is more performant to have it be one of the first 6.This could possibly matter in really, REALLY, tight loops 😝, though I would love to see such a case. Or a gigantic code base in which all the useful constructors are defined last.