Open adinapoli opened 7 years ago
This looks quite reasonable to me.
@dterei , do you think this has a chance to be merged or is there anything you would like to see before doing so?
Firstly, let's not remove ptext
, we should deprecate it first and remove it in a later release.
Secondly, I'm not sure about using a type-class. It seems reasonable, but we could as easily just add text_efficient :: String -> Int -> Doc a
rather than a type-class.
I'd love to understand better first for all of this work:
String
slow?String
compared with an alternative, such as FastString
?Sorry to slow you up, but this is a long living problem and potentially a big change, so I'd like to understand the bottom better. Let's keep the discussion mainly in #42.
Firstly, let's not remove ptext, we should deprecate it first and remove it in a later release.
I had no idea if (in practice) that was an obsolete relic or something some users could potentially still be using, but I do agree is way more prudent to go through the usual deprecation cycle!
Sorry to slow you up, but this is a long living problem and potentially a big change, so I'd like to understand the bottom better.
Absolutely! As there is quite an overlap with the discussion in #42 anyway, as you suggested I'm going to simply write a more comprehensive comment there so that we keep the discussion in only one place 😉
@adinapoli Anything I should be reviewing or doing with these new commits?
Hey @dterei ! I have done mostly 2 things:
ptext
and I have appropriately deprecated it.RuneSequence
, with these results:benchmarking render/doc1/string
time 330.4 ms (324.2 ms .. 336.8 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 324.7 ms (322.5 ms .. 327.0 ms)
std dev 2.703 ms (1.558 ms .. 3.349 ms)
variance introduced by outliers: 16% (moderately inflated)
benchmarking render/doc1/bytestring
time 322.6 ms (277.4 ms .. 367.9 ms)
0.995 R² (0.985 R² .. 1.000 R²)
mean 328.0 ms (321.2 ms .. 341.2 ms)
std dev 13.16 ms (220.3 μs .. 15.47 ms)
variance introduced by outliers: 16% (moderately inflated)
benchmarking render/doc2/string
time 588.7 ms (NaN s .. 594.1 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 583.2 ms (581.8 ms .. 584.4 ms)
std dev 1.898 ms (0.0 s .. 2.065 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking render/doc2/bytestring
time 572.7 ms (558.1 ms .. 585.8 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 577.0 ms (574.4 ms .. 578.8 ms)
std dev 2.708 ms (0.0 s .. 3.120 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking render/doc3/string
time 1.828 s (1.809 s .. 1.862 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.827 s (1.823 s .. 1.833 s)
std dev 5.259 ms (0.0 s .. 5.655 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking render/doc2/bytestring
time 1.822 s (1.794 s .. 1.835 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.817 s (1.811 s .. 1.821 s)
std dev 5.958 ms (0.0 s .. 6.690 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fullRender/PageMode 1000/string
time 585.9 ms (577.1 ms .. 594.9 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 587.0 ms (584.9 ms .. 588.1 ms)
std dev 1.829 ms (0.0 s .. 2.033 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fullRender/PageMode 1000/bytestring
time 587.0 ms (556.9 ms .. 607.6 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 583.0 ms (575.6 ms .. 586.8 ms)
std dev 6.424 ms (0.0 s .. 6.665 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fullRender/PageMode 100/string
time 585.1 ms (518.0 ms .. 646.5 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 600.0 ms (583.0 ms .. 609.8 ms)
std dev 15.21 ms (0.0 s .. 16.93 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fullRender/PageMode 100/bytestring
time 596.6 ms (584.5 ms .. 603.5 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 585.7 ms (579.6 ms .. 589.4 ms)
std dev 5.646 ms (0.0 s .. 6.456 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fullRender/ZigZagMode/string
time 236.8 ms (235.7 ms .. 238.4 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 234.3 ms (233.1 ms .. 235.3 ms)
std dev 1.372 ms (1.010 ms .. 1.664 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking fullRender/ZigZagMode/bytestring
time 236.0 ms (232.1 ms .. 240.7 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 235.7 ms (234.1 ms .. 237.0 ms)
std dev 1.787 ms (1.402 ms .. 2.170 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking fullRender/LeftMode/string
time 11.31 ms (11.02 ms .. 11.54 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 11.34 ms (11.15 ms .. 12.01 ms)
std dev 881.3 μs (230.7 μs .. 1.764 ms)
variance introduced by outliers: 40% (moderately inflated)
benchmarking fullRender/LeftMode/bytestring
time 10.89 ms (10.66 ms .. 11.08 ms)
0.996 R² (0.992 R² .. 0.998 R²)
mean 10.83 ms (10.68 ms .. 11.03 ms)
std dev 456.1 μs (332.6 μs .. 679.6 μs)
variance introduced by outliers: 16% (moderately inflated)
benchmarking fullRender/OneLineMode/string
time 27.44 ms (26.72 ms .. 28.14 ms)
0.995 R² (0.987 R² .. 1.000 R²)
mean 27.37 ms (26.93 ms .. 28.38 ms)
std dev 1.366 ms (652.6 μs .. 2.425 ms)
variance introduced by outliers: 16% (moderately inflated)
benchmarking fullRender/OneLineMode/bytestring
time 25.72 ms (25.11 ms .. 26.30 ms)
0.998 R² (0.996 R² .. 0.999 R²)
mean 26.54 ms (26.01 ms .. 27.54 ms)
std dev 1.613 ms (658.3 μs .. 2.328 ms)
variance introduced by outliers: 25% (moderately inflated)
As you can see so far, the use of ByteString
yields only marginally-better results. I think this is because the corpus we are benchmarking against is not particularly representative, and possibly we would need some real, GHC-driven data. I would be curious to hear what @bgamari has to say. Last but not least, I'm wondering if all those right appends could be improved by defining (Actually scrap that, RuneSequence
instances for DList
and Builder
length
O-complexity would be terrible for those two, I'm clearly tired at this time of the day 🙈 )
Yeah, we've had this issue before. Building a benchmark isn't too hard, building one that captures the performance issues that GHC has seen with pretty is far harder... I'm not sure how to help here as I don't know the issues GHC has faced.
It seems from our previous analysis that the calls to length
for the string type are the performance factor.
@adinapoli, do you need any help with this?
Hey @bgamari , thanks for keeping me honest 😉
Yes, a bit of help in gathering realistic, GHC-driven test cases would be awesome indeed. Considering it's not the most rewarding task to do (hehe) and that I'm in the middle of changing jobs, my free time is a bit limited these days 🙈
No worries; just let me know if you get stuck.
@bgamari To be completely clear, currently I feel I'm a bit stuck, in a sense. I don't feel compelled too much in exploring different solutions as I don't have realistic benchmarks which I can leverage. Having those (or at least pointers on how to generate some) would revamp the interest, I guess 😛
@adinapoli, quite understandable. It's hard to reduce GHC's use of the pretty-printer to a nice clean benchmark. The bit of GHC that is likely dependent upon pretty-printer performance is the native-code generator (NCG). Here we are constructing assembler programs, which are essentially nothing but FastString
s with some interstitial whitespace. @niteria has previously noted that GHC currently spends a significant amount of time in the pretty-printer due to the NCG.
If I were to guess, I would try constructing a benchmark that follows roughly the model I describe above: a lot of short lines, each consisting of a few shorts string separated by whitespace.
I should note that another option in the cards for GHC is to try swapping out pretty
with @quchen's prettyprinter
library. This would likely require picking up a text
dependency, but with the way things are going it's looking rather inevitable that we'll need to include text
in the GHC build eventually, even if the ghc
library itself doesn't depend upon it. prettyprinter
has the advantage that it already offers an optimized infinite ribbon-width codepath, which may significantly help the NCG usecase.
FWIW if Text is a problem, forking prettyprinter to use String is simple, but not something I want to have in the Hackage release.
It does seem like it would be worth trying one of the alternative pretty-printer libraries. Not sure how much work that would be, hopefully only a small amount.
Otherwise though, in terms of isolating performance issues in pretty
, how are we detecting that performance is an issue today? Presumably @adinapoli or others have already tried replacing ghc-pretty
with pretty
and noticed some performance issues through the GHC test-suite. Obviously that's a big and ugly way to test, but at least a starting point. For example, can we get the output of that performance test from GHC to understand its structure?
Indeed in principle moving to a new pretty-printing library should be fairly straightforward. @bollu is currently trying to do this in D3650. Unfortunately, these sorts of things are never as simple as they seem: currently the branch seems to suffer from what appears to be a code-generation bug (perhaps due to slight differences in assembler output). This is one of many things on my list of things to look into but, as always, help is greatly appreciated.
Hey all,
following from @bgamari 's feedback, this is a very first, crude attempt to generalise the user-facing API to support multiple textual representations which can be reified back into a normal
String
, but that supports getting the length of the sequence in an efficient way. This PR does two things:It (boldly!) removes the
PStr
constructor fromTextDetails
, as I think it's deprecated anyway and is adding just clutter for nothing.It introduces the
RuneSequence r
typeclass which allows generalising the top level API (things like thetext
combinators) to be independent fromString
.I really hope this is a good step towards the bigger goal of replacing GHC's internal copy of
pretty
with this library 😉Let me know what you guys think!