[question] regarding disjunction

sorawee commented 3 years ago

I read the paper. It is very clear, and I want to try it out. Unfortunately, this repository, which seems to be the canonical implementation of the algorithm described in the paper, doesn't seem to support the disjunction operator.

From #10, it looks like disjunction is removed because

it is possible to use disjunction in a way that triggers exponential behaviour.

Could you give me an example where the exponential behavior is triggered? As far as I can tell, it is a polynomial algorithm.

In section 9 of the paper, you said that:

In return I have to pay a larger constant cost to sieve through intermediate results. Yet I conjecture that for non-pathological inputs the asymptotic complexities for both algorithms are the same.

Actually, I will argue that Pareto frontier + valid filtering should always give you the same O(n w^4) algorithm, same as the one in Podkopaev and Boulytchev [2014]. This is because for any (maxWidth, lastWidth), there can be only one height in the Pareto frontier. Another different value of height will either dominate or be dominated. Therefore, there are only O(w^2) configurations to consider, and therefore, in the concatenate operation, you need to consider O(w^2 * w^2) = O(w^4) combinations.

Nit: in section 4.3, the visible function, I think where is missing in front of pageWidth = 80 (sorry if this is incorrect, I haven't really used Haskell).

Thanks again for the illuminating paper.

justinpombrio commented 3 years ago

Hi Jean-Philippe,

We met briefly at ICFP 2017. I'm also a fan of your approach! I think it's the only feasible way to do vertical concatenation / alignment, and I implemented it (or something like it) once.

I have a guess as to what the exponential blowup is, and what a solution looks like.

My guess is that the choice operator makes it easy to construct Docs of exponential size. For example, say you want to implement word wrap. Like format "aa, b, c, dddd, e, f" with width 5 as

aa, b
c,
dddd,
e, f

So you write a combinator for it (which annoyingly needs to recurse starting at the end of the list):

wrap xs = wrapRec (reverse xs)
wrapRec [x] = x
wrapRec (x:xs) =
        (wrapRec xs <> text "," <> x)
    <|> (wrapRec xs <> text "," $$ x)

This is a fairly natural thing to write, but it constructs a Doc of exponential size due to the repetition of xs in the definition of wrapRec. And there isn't really another way to do word wrap. Thus, exponential blowup.

Is that right?

If so, one solution is to, instead of having a function AST -> Doc, have a mapping ASTNode -> Notation, where a Notation is a chunk of a Doc, with explicit Child(int) variants to refer to its children. And there's a Repeat variant for variable-arity nodes that basically does a fold over its children. That's the approach I'm taking here: https://github.com/justinpombrio/partial-pretty-printer/blob/master/src/notation.rs (That repo does not implement your algorithm, but the approach of "split a Doc into a Notation for each node in the AST" is orthogonal to the choice of algorithm.)

That's pretty heavy-weight though. Maybe a lighter approach would be to introduce a new Doc variant, called docLet, that memoizes the measurements of its argument?

wrap xs = wrapRec (reverse xs)
wrapRec [x] = x
wrapRec (x:xs) = docLet "arg" (wrapRec xs) $
    ((docVar "arg")  <> text "," <> x)
  <|> ((docVar "arg") <> text "," $$ x)

sorawee commented 3 years ago

FWIW, memoization is the approach that I am taking at https://github.com/sorawee/pprint-compact/blob/a685ccc315268682256a1b96e3829db7b076ec97/main.rkt#L77. The memoization table lookup could either use a hash table based on structural equality or reference equality. For the latter, the lookup cost is very cheap, but it necessitates that users must share reference instead of constructing new docs all over the place. So for now, I opt for the former.

jyp / prettiest

[question] regarding disjunction #19