Quadratic performance issues

jwaldmann commented 8 years ago

Try this:

import Text.PrettyPrint
import System.Environment
main = do
  [ s ] <- getArgs
  print $ iterate ( \ x -> fsep [ text "a" , x <+> text "b" ] ) empty !! read s

on my machine:

input | runtime in sec

100   |  0.1
200   |  1.1
300   |  4.4
400   | 11.4

This testcase is simplified from https://ghc.haskell.org/trac/ghc/ticket/7666 I think this bug (gut feeling - some quadratic behaviour) has been sitting there for a long time. I do think this is serious. I think it also hurts haddock.

pretty-1.1.3.2 for ghc-8.0.0.20160111 and pretty-1.1.2.0 for ghc-7.10.3

thomie commented 8 years ago

Changing $! to $ in the function beside helps quite a bit:

input	before (s)	after (s)
100	0.19	0.12
200	1.7	0.35
400	17	1.3

This is with ghc-7.10.3 -O and pretty-1.1.3.2.

git bisect points at 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31 and https://github.com/ghc/ghc/commit/ac88f113abdec1edbffb6d2f97323e81f82908e7.

I tested the same change in GHC's copy of pretty (compiler/utils/Pretty), hoping it would make GHC massively faster! Unfortunately, it just had a negative effect on T3294:

  tests/alloc/T3294     2679798176  + 4.21% 2792570824  bytes

jwaldmann commented 8 years ago

Interesting. Well, the bug seems to hit only for certain kinds of nesting. But I have a few applications where I think it matters. And I have seen horrendous ghc and/or haddock runtimes for modules with hundreds of identifiers, e.g., in OpenGLRaw, and I have a hunch that this is related. You could try your patched ghc on these.

I think 1second runtime for outputting 2k chars is still way high. For a similar test case with wl-pprint, it is more like 10 millisecs (but the combinators have different behaviour)

ndmitchell commented 8 years ago

I started debugging a stack overflow in Hoogle, see https://github.com/ndmitchell/hoogle/issues/167, and it brought me to exactly this location. Specifically, given the operation:

import Text.PrettyPrint.Annotated.HughesPJ
main = print $ hsep $ replicate 1000 $ text "neil"

This takes an unbounded amount of stack. If I replace:

- beside (TextBeside t p)    g q   = TextBeside t $! rest
+ beside (TextBeside t p)    g q   = TextBeside t $ rest

Then it takes a constant amount of stack. It turns an O(n) memory algorithm into an O(1) algorithm, so seems like it should be considered a fix. @dterei, any thoughts?

dterei commented 8 years ago

Sure. Do you mind submitting a PR with a test-case please?

ndmitchell commented 8 years ago

Sure. @thomie, I intend to remove that one bang, do you have any others I should include in the ticket?

I found an even better test case, take 10 $ render $ hsep $ repeat $ text "neil" works with the new formulation and loops with the old.

thomie commented 8 years ago

Fine by me.

In case you know the answer, it would be great if you could add a comment or Note explaining why the other $!s are necessary (or a test even). They were all together added in 6e01b2ec5f5adceb405cfc9aa29c060f385a9f31.

ndmitchell commented 8 years ago

To be clear, I can demonstrate that one particular bang is harmful - I have no opinion on the others - but my guess is someone had bangs and thought they added performance so sprayed them with a machine gun. It's a standard Haskell optimisation without benchmarking trick :disappointed:

jwaldmann commented 8 years ago

I am glad that this issue is getting some attention.

The worrisome thing is that my benchmark above is basically an iteration of a random (!) small context (\ x -> ... ). So even if behaviour improves for this particular case, others should be tested (as ndm did) - and it's probably best to enumerate contexts in a smallcheck style. Let me see if I find the time to do that.

ndmitchell commented 8 years ago

@jwaldmann - so I have one specific place which causes a more discrete regression (overly strict, different space complexity) - so is worth fixing independent of any performance measurements. Is it worth waiting for your benchmarks or shall I pull request it today?

jwaldmann commented 8 years ago

Don't wait for me. I made this for testing: https://github.com/jwaldmann/pretty-test

ndmitchell commented 8 years ago

Pull request for my piece as #35.

jwaldmann commented 8 years ago

For reference, here's the "winner" (most expensive context with depth 3 -- according to SmallCheck). I write it in the form that you can execute in ghci:

import Text.PrettyPrint.HughesPJ

:set +s
length $ render $ iterate ( \ hole ->  sep [text  "l", cat [hole], text  "l"] ) (text  "l") !! 1000

4001
(10.42 secs, 9,684,556,640 bytes)

This is quite consistent for ghc-7.6.3, 7.8.4, 7.10.3, while ghc-8.0.1 is better by one second.

jwaldmann commented 8 years ago

What does the original prettyprint paper ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 ) say about (asymptotic) performance?

There are no exact claims or proofs, but end of Section 9 has "seems to grow slightly faster than linearly" and "far superior to O(n^2)".

Really? Here is the runtimes for rendering iterations of context mentioned previously (x axis: iterations, y axis: time in seconds, average over 3 executions). (forgive the crude rendering)

d-1000

Looks accidentally quadratic ?

I think that for a (pretty)printing library, anything more than linear is unacceptable , and as long as the behaviour is not fixed, "dangerous" combinators must be red-flagged in the API docs.

What are these dangerous combinators? Everything that has an "either .. or" ? (sep : either hsep or vcat, etc.)

Detection of non-linearity in pretty's code would really make a nice test case for automated analysis of runtime complexity of programs. I can advertise this at http://cl-informatik.uibk.ac.at/users/tpowell/lca but don't expect immediate results ...

Two technical points here: linear, quadratic, etc. - in what parameter exactly? Size of input? Size of output?

the natural measure is the number of API calls (the size of the abstract syntax tree). This is to include calls to, and processing of empty, which produces no output, but has a cost.
there are cases where output size is more than linear of input size (because of nesting, progressive amounts of whitespace could be produced). This does not happen for the context I presented above (everything starts in column 0). But then, since we render for a fixed page width, there cannot be too much whitespace in output? Or, we'd could whitespace in some extra way.

ndmitchell commented 8 years ago

From what I've read of the code I would be shocked if it wasn't quadratic. It seems to regularly be traversing the entire tree to ensure global properties, whereas I would expect it to consider all Doc values in some normal form and then use smart constructors to guarantee that without traversing downwards.

haskell / pretty

Quadratic performance issues #32