quchen / prettyprinter

A modern, extensible and well-documented prettyprinter.
BSD 2-Clause "Simplified" License
294 stars 36 forks source link

Optimize the order of constructors in Doc #125

Open sjakobi opened 4 years ago

sjakobi commented 4 years ago

Starting with GHC 8.10, handling the first 6 constructors of Doc should be slightly faster than handling the other 7. See this GHC patch for the background on this.

By shuffling around the order of constructors, we might be able to speed up typical documents.

For example, Cat should probably be in the first group, while Fail and FlatAlt could be in the second group without much performance degradation. Annotated could also be a candidate for the first group.

A little helper function that reports the frequency of constructors would be useful here.

SimpleDocStream fortunately has 7 constructors, which is exactly the number of constructor tags on 64-bit systems. 0 is the tag for unevaluated thunks.

sjakobi commented 4 years ago

A little helper function that reports the frequency of constructors would be useful here.

```haskell data Frequencies = Frequencies { _fail :: !Int , _empty :: !Int , _char :: !Int , _text :: !Int , _flatAlt :: !Int , _line :: !Int , _cat :: !Int , _nest :: !Int , _union :: !Int , _column :: !Int , _withPageWidth :: !Int , _nesting :: !Int , _annotated :: !Int } deriving Show emptyFrequencies :: Frequencies emptyFrequencies = Frequencies 0 0 0 0 0 0 0 0 0 0 0 0 0 addFrequencies :: Frequencies -> Frequencies -> Frequencies addFrequencies (Frequencies a0 b0 c0 d0 e0 f0 g0 h0 i0 j0 k0 l0 m0) (Frequencies a1 b1 c1 d1 e1 f1 g1 h1 i1 j1 k1 l1 m1) = Frequencies (a0 + a1) (b0 + b1) (c0 + c1) (d0 + d1) (e0 + e1) (f0 + f1) (g0 + g1) (h0 + h1) (i0 +i1) (j0 +j1) (k0 + k1) (l0 + l1) (m0 + m1) countFrequencies' :: Int -> PageWidth -> Int -> Doc ann -> Frequencies countFrequencies' column_ pageWidth_ nesting_ x = case x of Fail -> e { _fail = 1 } Empty -> e { _empty = 1 } Char{} -> e { _char = 1 } Text{} -> e { _text = 1 } FlatAlt a b -> e { _flatAlt = 1 } `add` count a `add` count b Line -> e { _line = 1 } Cat a b -> e { _cat = 1 } `add` count a `add` count b Nest _ a -> e { _nest = 1 } `add` count a Union a b -> e { _union = 1 } `add` count a `add` count b Column f -> e { _column = 1 } `add` count (f column_) WithPageWidth f -> e { _withPageWidth = 1 } `add` count (f pageWidth_) Nesting f -> e { _nesting = 1 } `add` count (f nesting_) Annotated _ a -> e { _annotated = 1 } `add` count a where e = emptyFrequencies add = addFrequencies count = countFrequencies' column_ pageWidth_ nesting_ countFrequencies :: Doc ann -> Frequencies countFrequencies = countFrequencies' 10 defaultPageWidth 10 frequenciesToList :: Frequencies -> [(Int, Text)] frequenciesToList f = [ (_fail f, "Fail") , (_empty f, "Empty") , (_char f, "Char") , (_text f, "Text") , (_flatAlt f, "FlatAlt") , (_line f, "Line") , (_cat f, "Cat") , (_nest f, "Nest") , (_union f, "Union") , (_column f, "Column") , (_withPageWidth f, "WithPageWidth") , (_nesting f, "Nesting") , (_annotated f, "Annotated") ] ```

I have used countFrequencies on some dhall expressions (as usual):

``` λ> t <- Data.Text.IO.readFile "/home/simon/src/cpkg/pkgs/pkg-set.dhall" λ> Right e = exprFromText "" t λ> Data.List.sort $ frequenciesToList $ countFrequencies (prettyExpr e) [ ( 0 , "WithPageWidth" ) , ( 5056 , "Fail" ) , ( 2774984 , "Nest" ) , ( 9265877 , "Column" ) , ( 9265877 , "Nesting" ) , ( 10152653 , "FlatAlt" ) , ( 10152654 , "Union" ) , ( 15691638 , "Line" ) , ( 24664844 , "Empty" ) , ( 86956579 , "Text" ) , ( 195717122 , "Char" ) , ( 199805788 , "Annotated" ) , ( 302729931 , "Cat" ) ] ```

In some expressions Column and Nesting (introduced quite surely by align) appear more frequently. This is Prelude.JSON.renderYAML:

``` [ ( 0 , "WithPageWidth" ) , ( 124 , "Fail" ) , ( 143695 , "FlatAlt" ) , ( 143696 , "Union" ) , ( 154122 , "Nest" ) , ( 244368 , "Line" ) , ( 288541 , "Column" ) , ( 288541 , "Nesting" ) , ( 836654 , "Empty" ) , ( 1288859 , "Text" ) , ( 2786542 , "Char" ) , ( 3100284 , "Annotated" ) , ( 4869155 , "Cat" ) ] ```

I do wonder whether counting constructor by traversing the entire tree is the right thing to do though. Maybe it would be better to hook into e.g. layoutWadlerLeijen and count the constructors we pattern-match on there.

In https://github.com/quchen/prettyprinter/commit/e79bb67f5e05f9dc9d0e08335a2517ea392664b0 I have moved the Cat and Annotated constructors into the first constructor group. I haven't seen a significant performance difference from this change though. Better benchmarks (https://github.com/quchen/prettyprinter/issues/118) would be helpful!