Open jwaldmann opened 8 years ago
Changing $!
to $
in the function beside
helps quite a bit:
input | before (s) | after (s) |
---|---|---|
100 | 0.19 | 0.12 |
200 | 1.7 | 0.35 |
400 | 17 | 1.3 |
This is with ghc-7.10.3 -O
and pretty-1.1.3.2.
git bisect points at 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31 and https://github.com/ghc/ghc/commit/ac88f113abdec1edbffb6d2f97323e81f82908e7.
I tested the same change in GHC's copy of pretty (compiler/utils/Pretty), hoping it would make GHC massively faster! Unfortunately, it just had a negative effect on T3294:
tests/alloc/T3294 2679798176 + 4.21% 2792570824 bytes
Interesting. Well, the bug seems to hit only for certain kinds of nesting. But I have a few applications where I think it matters. And I have seen horrendous ghc and/or haddock runtimes for modules with hundreds of identifiers, e.g., in OpenGLRaw, and I have a hunch that this is related. You could try your patched ghc on these.
I think 1second runtime for outputting 2k chars is still way high. For a similar test case with wl-pprint, it is more like 10 millisecs (but the combinators have different behaviour)
I started debugging a stack overflow in Hoogle, see https://github.com/ndmitchell/hoogle/issues/167, and it brought me to exactly this location. Specifically, given the operation:
import Text.PrettyPrint.Annotated.HughesPJ
main = print $ hsep $ replicate 1000 $ text "neil"
This takes an unbounded amount of stack. If I replace:
- beside (TextBeside t p) g q = TextBeside t $! rest
+ beside (TextBeside t p) g q = TextBeside t $ rest
Then it takes a constant amount of stack. It turns an O(n) memory algorithm into an O(1) algorithm, so seems like it should be considered a fix. @dterei, any thoughts?
Sure. Do you mind submitting a PR with a test-case please?
Sure. @thomie, I intend to remove that one bang, do you have any others I should include in the ticket?
I found an even better test case, take 10 $ render $ hsep $ repeat $ text "neil"
works with the new formulation and loops with the old.
Fine by me.
In case you know the answer, it would be great if you could add a comment or Note explaining why the other $!
s are necessary (or a test even). They were all together added in 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31.
To be clear, I can demonstrate that one particular bang is harmful - I have no opinion on the others - but my guess is someone had bangs and thought they added performance so sprayed them with a machine gun. It's a standard Haskell optimisation without benchmarking trick :disappointed:
I am glad that this issue is getting some attention.
The worrisome thing is that my benchmark above is basically an iteration of a random (!) small context
(\ x -> ... )
. So even if behaviour improves for this particular case, others should be tested (as ndm did) - and it's probably best to enumerate contexts in a smallcheck style. Let me see if I find the time to do that.
@jwaldmann - so I have one specific place which causes a more discrete regression (overly strict, different space complexity) - so is worth fixing independent of any performance measurements. Is it worth waiting for your benchmarks or shall I pull request it today?
Don't wait for me. I made this for testing: https://github.com/jwaldmann/pretty-test
Pull request for my piece as #35.
For reference, here's the "winner" (most expensive context with depth 3 -- according to SmallCheck). I write it in the form that you can execute in ghci:
import Text.PrettyPrint.HughesPJ
:set +s
length $ render $ iterate ( \ hole -> sep [text "l", cat [hole], text "l"] ) (text "l") !! 1000
4001
(10.42 secs, 9,684,556,640 bytes)
This is quite consistent for ghc-7.6.3, 7.8.4, 7.10.3, while ghc-8.0.1 is better by one second.
What does the original prettyprint paper ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 ) say about (asymptotic) performance?
There are no exact claims or proofs, but end of Section 9 has "seems to grow slightly faster than linearly" and "far superior to O(n^2)".
Really? Here is the runtimes for rendering iterations of context mentioned previously (x axis: iterations, y axis: time in seconds, average over 3 executions). (forgive the crude rendering)
Looks accidentally quadratic ?
I think that for a (pretty)printing library, anything more than linear is unacceptable , and as long as the behaviour is not fixed, "dangerous" combinators must be red-flagged in the API docs.
What are these dangerous combinators? Everything that has an "either .. or" ? (sep
: either hsep
or vcat
, etc.)
Detection of non-linearity in pretty's code would really make a nice test case for automated analysis of runtime complexity of programs. I can advertise this at http://cl-informatik.uibk.ac.at/users/tpowell/lca but don't expect immediate results ...
Two technical points here: linear, quadratic, etc. - in what parameter exactly? Size of input? Size of output?
empty
, which produces no output, but has a cost.From what I've read of the code I would be shocked if it wasn't quadratic. It seems to regularly be traversing the entire tree to ensure global properties, whereas I would expect it to consider all Doc values in some normal form and then use smart constructors to guarantee that without traversing downwards.
Try this:
on my machine:
This testcase is simplified from https://ghc.haskell.org/trac/ghc/ticket/7666 I think this bug (gut feeling - some quadratic behaviour) has been sitting there for a long time. I do think this is serious. I think it also hurts haddock.
pretty-1.1.3.2 for ghc-8.0.0.20160111 and pretty-1.1.2.0 for ghc-7.10.3