Open sorawee opened 3 years ago
Hi Jean-Philippe,
We met briefly at ICFP 2017. I'm also a fan of your approach! I think it's the only feasible way to do vertical concatenation / alignment, and I implemented it (or something like it) once.
I have a guess as to what the exponential blowup is, and what a solution looks like.
My guess is that the choice operator makes it easy to construct Docs of exponential size. For example, say you want to implement word wrap. Like format "aa, b, c, dddd, e, f" with width 5 as
aa, b
c,
dddd,
e, f
So you write a combinator for it (which annoyingly needs to recurse starting at the end of the list):
wrap xs = wrapRec (reverse xs)
wrapRec [x] = x
wrapRec (x:xs) =
(wrapRec xs <> text "," <> x)
<|> (wrapRec xs <> text "," $$ x)
This is a fairly natural thing to write, but it constructs a Doc of exponential size due to the repetition of xs
in the definition of wrapRec
. And there isn't really another way to do word wrap. Thus, exponential blowup.
Is that right?
If so, one solution is to, instead of having a function AST -> Doc
, have a mapping ASTNode -> Notation
, where a Notation
is a chunk of a Doc
, with explicit Child(int)
variants to refer to its children. And there's a Repeat
variant for variable-arity nodes that basically does a fold
over its children. That's the approach I'm taking here:
https://github.com/justinpombrio/partial-pretty-printer/blob/master/src/notation.rs
(That repo does not implement your algorithm, but the approach of "split a Doc into a Notation for each node in the AST" is orthogonal to the choice of algorithm.)
That's pretty heavy-weight though. Maybe a lighter approach would be to introduce a new Doc
variant, called docLet
, that memoizes the measurements of its argument?
wrap xs = wrapRec (reverse xs)
wrapRec [x] = x
wrapRec (x:xs) = docLet "arg" (wrapRec xs) $
((docVar "arg") <> text "," <> x)
<|> ((docVar "arg") <> text "," $$ x)
FWIW, memoization is the approach that I am taking at https://github.com/sorawee/pprint-compact/blob/a685ccc315268682256a1b96e3829db7b076ec97/main.rkt#L77. The memoization table lookup could either use a hash table based on structural equality or reference equality. For the latter, the lookup cost is very cheap, but it necessitates that users must share reference instead of constructing new docs all over the place. So for now, I opt for the former.
I read the paper. It is very clear, and I want to try it out. Unfortunately, this repository, which seems to be the canonical implementation of the algorithm described in the paper, doesn't seem to support the disjunction operator.
From #10, it looks like disjunction is removed because
Could you give me an example where the exponential behavior is triggered? As far as I can tell, it is a polynomial algorithm.
In section 9 of the paper, you said that:
Actually, I will argue that Pareto frontier + valid filtering should always give you the same O(n w^4) algorithm, same as the one in Podkopaev and Boulytchev [2014]. This is because for any (maxWidth, lastWidth), there can be only one height in the Pareto frontier. Another different value of height will either dominate or be dominated. Therefore, there are only O(w^2) configurations to consider, and therefore, in the concatenate operation, you need to consider O(w^2 * w^2) = O(w^4) combinations.
Nit: in section 4.3, the
visible
function, I thinkwhere
is missing in front ofpageWidth = 80
(sorry if this is incorrect, I haven't really used Haskell).Thanks again for the illuminating paper.