haskell / pretty

Haskell Pretty-printer library
Other
71 stars 31 forks source link

Quadratic performance issues #32

Open jwaldmann opened 8 years ago

jwaldmann commented 8 years ago

Try this:

import Text.PrettyPrint
import System.Environment
main = do
  [ s ] <- getArgs
  print $ iterate ( \ x -> fsep [ text "a" , x <+> text "b" ] ) empty !! read s

on my machine:

input | runtime in sec

100   |  0.1
200   |  1.1
300   |  4.4
400   | 11.4

This testcase is simplified from https://ghc.haskell.org/trac/ghc/ticket/7666 I think this bug (gut feeling - some quadratic behaviour) has been sitting there for a long time. I do think this is serious. I think it also hurts haddock.

pretty-1.1.3.2 for ghc-8.0.0.20160111 and pretty-1.1.2.0 for ghc-7.10.3

thomie commented 8 years ago

Changing $! to $ in the function beside helps quite a bit:

input before (s) after (s)
100 0.19 0.12
200 1.7 0.35
400 17 1.3

This is with ghc-7.10.3 -O and pretty-1.1.3.2.

git bisect points at 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31 and https://github.com/ghc/ghc/commit/ac88f113abdec1edbffb6d2f97323e81f82908e7.

I tested the same change in GHC's copy of pretty (compiler/utils/Pretty), hoping it would make GHC massively faster! Unfortunately, it just had a negative effect on T3294:

  tests/alloc/T3294     2679798176  + 4.21% 2792570824  bytes
jwaldmann commented 8 years ago

Interesting. Well, the bug seems to hit only for certain kinds of nesting. But I have a few applications where I think it matters. And I have seen horrendous ghc and/or haddock runtimes for modules with hundreds of identifiers, e.g., in OpenGLRaw, and I have a hunch that this is related. You could try your patched ghc on these.

I think 1second runtime for outputting 2k chars is still way high. For a similar test case with wl-pprint, it is more like 10 millisecs (but the combinators have different behaviour)

ndmitchell commented 8 years ago

I started debugging a stack overflow in Hoogle, see https://github.com/ndmitchell/hoogle/issues/167, and it brought me to exactly this location. Specifically, given the operation:

import Text.PrettyPrint.Annotated.HughesPJ
main = print $ hsep $ replicate 1000 $ text "neil"

This takes an unbounded amount of stack. If I replace:

- beside (TextBeside t p)    g q   = TextBeside t $! rest
+ beside (TextBeside t p)    g q   = TextBeside t $ rest

Then it takes a constant amount of stack. It turns an O(n) memory algorithm into an O(1) algorithm, so seems like it should be considered a fix. @dterei, any thoughts?

dterei commented 8 years ago

Sure. Do you mind submitting a PR with a test-case please?

ndmitchell commented 8 years ago

Sure. @thomie, I intend to remove that one bang, do you have any others I should include in the ticket?

I found an even better test case, take 10 $ render $ hsep $ repeat $ text "neil" works with the new formulation and loops with the old.

thomie commented 8 years ago

Fine by me.

In case you know the answer, it would be great if you could add a comment or Note explaining why the other $!s are necessary (or a test even). They were all together added in 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31.

ndmitchell commented 8 years ago

To be clear, I can demonstrate that one particular bang is harmful - I have no opinion on the others - but my guess is someone had bangs and thought they added performance so sprayed them with a machine gun. It's a standard Haskell optimisation without benchmarking trick :disappointed:

jwaldmann commented 8 years ago

I am glad that this issue is getting some attention.

The worrisome thing is that my benchmark above is basically an iteration of a random (!) small context (\ x -> ... ). So even if behaviour improves for this particular case, others should be tested (as ndm did) - and it's probably best to enumerate contexts in a smallcheck style. Let me see if I find the time to do that.

ndmitchell commented 8 years ago

@jwaldmann - so I have one specific place which causes a more discrete regression (overly strict, different space complexity) - so is worth fixing independent of any performance measurements. Is it worth waiting for your benchmarks or shall I pull request it today?

jwaldmann commented 8 years ago

Don't wait for me. I made this for testing: https://github.com/jwaldmann/pretty-test

ndmitchell commented 8 years ago

Pull request for my piece as #35.

jwaldmann commented 8 years ago

For reference, here's the "winner" (most expensive context with depth 3 -- according to SmallCheck). I write it in the form that you can execute in ghci:

import Text.PrettyPrint.HughesPJ

:set +s
length $ render $ iterate ( \ hole ->  sep [text  "l", cat [hole], text  "l"] ) (text  "l") !! 1000

4001
(10.42 secs, 9,684,556,640 bytes)

This is quite consistent for ghc-7.6.3, 7.8.4, 7.10.3, while ghc-8.0.1 is better by one second.

jwaldmann commented 8 years ago

What does the original prettyprint paper ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 ) say about (asymptotic) performance?

There are no exact claims or proofs, but end of Section 9 has "seems to grow slightly faster than linearly" and "far superior to O(n^2)".

Really? Here is the runtimes for rendering iterations of context mentioned previously (x axis: iterations, y axis: time in seconds, average over 3 executions). (forgive the crude rendering)

d-1000

Looks accidentally quadratic ?

I think that for a (pretty)printing library, anything more than linear is unacceptable , and as long as the behaviour is not fixed, "dangerous" combinators must be red-flagged in the API docs.

What are these dangerous combinators? Everything that has an "either .. or" ? (sep : either hsep or vcat, etc.)

Detection of non-linearity in pretty's code would really make a nice test case for automated analysis of runtime complexity of programs. I can advertise this at http://cl-informatik.uibk.ac.at/users/tpowell/lca but don't expect immediate results ...

Two technical points here: linear, quadratic, etc. - in what parameter exactly? Size of input? Size of output?

ndmitchell commented 8 years ago

From what I've read of the code I would be shocked if it wasn't quadratic. It seems to regularly be traversing the entire tree to ensure global properties, whereas I would expect it to consider all Doc values in some normal form and then use smart constructors to guarantee that without traversing downwards.