typelevel / paiges

an implementation of Wadler's a prettier printer
Apache License 2.0
186 stars 30 forks source link

How to deal with custom flattenings? #22

Closed johnynek closed 7 years ago

johnynek commented 7 years ago

@olafurpg has asked

I'm curious, would it be possible to expose some control over what is returned here? In my case, I'd like to return Doc.empty to get the formatting below when using Doc.grouped (which calls flattenOption)

// Doc.empty
Term.Name("intercalate")
// Doc.space
Term.Name( "intercalate" )

There are actually possible interpretations:

  1. how can I do custom groupings (but within the current model)
  2. can we extend the current model.

For 1. I don't want to introduce any dependency injection or config to Doc. I would prefer people write their own modules:

class MyDocMod(parenSpace: Doc) {
  def mkTerm(nm: String): Doc =
    Doc.text("Term.Name(") + parenSpace :+ nm + parenSpace :+ ")"
}

and then you have your config that plugs in all the inputs to control the way you want to work.

  1. Specifically for your second question, I think you want and emptyOrLine type combinator. I am thinking about how we can test these things. The algorithm assumes some invariants, I am trying to see the minimum it assumes, but one thing is that where we can do two things (Union) the thing on the left makes the longer line. If you are not careful you can introduce branches that are meaningless because the algorithm will always go to the left. For custom flattening, have similar issues. We have to make sure to apply the right flattening at the right place, keeping in mind later someone might flatten your doc and at that point, it might not be semantically sane to plug in any kind of flattening.

I'd like to stray as little as we can from the paper and with clear examples of formats we want but can't currently express.

non commented 7 years ago

@olafurpg do you think something like emptyOrLine would be sufficient to express what you want?

olafurpg commented 7 years ago

@non I am still trying to wrap my head around how this algorithm works, so I'm not sure. Would emptyOrLine allow me to express flattenOption (called via grouped) that uses Doc.empty instead of Doc.space in this line here?

For example, if I replaced the two lines in bracketBy with lineOrEmpty, would it be possible that one of those lineOrEmpty could turn into a line while the other turned into empty? I would like both of them to either become line or empty, but not a mix of the two.

Another potential situation where I think empty might be desirable over space is fill. If fill used empty, would it be possible to express the current space behavior by prepending space to all docs in the ds parameter? That is,

val ds = text("1") :: text("2") :: text("3") :: Nil
val dsSpaced = text("1") :: text(" 2") :: text(" 3") :: Nil
// before
fill(comma, ds).render(10)  // produces "1, 2, 3"
fill(comma, dsSpaced).render(10)  // produces "1,  2,  3"
// after
fill(comma, ds).render(10)  // produces "1,2,3"
fill(comma, dsSpaced).render(10)  // produces "1, 2, 3"

On a sidenote, it would be useful if there was a clearer separation between "primitive" combinators (can't be expressed by 3rd party) vs. "library" combinators (utilities that could be expressed by 3rd party but are included in paiges for convenience).

olafurpg commented 7 years ago

Here is a draft of what I had in mind with "dependency injection" for configuration options:

https://github.com/johnynek/paiges/compare/master...olafurpg:configuration?expand=1

EDIT inline summary of that diff:

case class DocConfig(spaceInFlattenOption: Boolean = true)
class DocOps(config: DocConfig) {
  def flattenOption(doc: Doc): Option[Doc] = {
    // ...
      val next = if (config.spaceInFlattenOption) Doc.space else Doc.empty
    // ...
    loop(doc)
  }
}
object DocOps extends DocOps(DocConfig())
class Doc {
  def flattenOption = DocOps.flattenOption(this)
}

This preserves the original Doc api but opens possibilities for 3rd parties to configure the primitive behavior by passing in a custom DocConfig to DocOps. The configuration options should only expose features which preserve the safe-by-construction invariant for Docs. The DocConfig class may be unnecessary if it ends containing only a single boolean parameter.

johnynek commented 7 years ago

Currently, this will break assumptions of the paper as stated, thought it may be possible to weaken those assumptions.

Two assumptions of the paper:

  1. when there are multiple renderings of a Doc, flattening makes them all the same.
  2. union nodes, which express the ability to have multiple renderings, always have the left side with a longer current line.

The second one is not hard to maintain I don't think, but the first one is. It means you can't flatten differently at different places and expect the Doc to stay sane. Without that rule (or some preservation of it), flattening (and hence grouping) becomes exponential in the number of embedded unions (which can easily grow linearly in Doc size).

So, I'm trying to express the laws stated in the paper and try to get a handle on the implications of loosening them.

One escape hatch may be that newline nodes always carry with them how they are to be flattened (i.e. it may not be a space, there could be other ways). So, it is not that you flatten differently, it is that you create different kinds of newlines. It is a small difference, but I think it may allow us to preserve the invariants expressed in the paper to keep the performance of the algorithm.

So, object Line extends Doc would become case class Line(flattenWith: Doc) extends Doc.

There are a number of algorithms internal to this code (rendering, computing longest lines, comparing Docs). I am nervous that if we make flattening a black box, those algorithms will all become exponential (while currently, they are all linear in rendered width (using only width stack)).

johnynek commented 7 years ago

Actually, @olafurpg it occurs to me maybe the best way forward is to give examples of formats you would like that we don't know how to make with the current combinators. Then we can either see if we can (and they would then serve as useful documentation) or we could have a concrete example of something we need to support.

olafurpg commented 7 years ago

I definitely don't want exponential running time, I've had enough of that in scalafmt :)

One escape hatch may be that newline nodes always carry with them how they are to be flattened

If you can make that work, then that would be a better solution than my proposed DocConfig.

When reading the current implementation, it felt like Doc.space was hardcoded for no reason. If the space is there to stay, it might be worth to add a short comment explaining why it's like that. (btw, the code is a pure pleasure to read, I learned a lot just going through it).

formats you would like that we don't know how to make with the current combinators

I've already been able to get roughly the format I want for argument lists, which from my experience with scalafmt is was the hardest part. Doc.fill + Doc.bracketBy is an amazing combination. It only took ~10 minutes to get it working from knowing almost nothing.

Here's an example output that I got for pathologically nested code: https://gist.github.com/44ee2d0a8c5b8330702cf00d23d0978a The only edit I did was to change the Doc.space to Doc.empty in flattenOption (globally across all Doc, not partially). Performance does not seem to be a problem, paiges burns through that example in ~300ms (informal benchmark).

Actually, thinking about it now, I might be able to get that output with a lineOrEmpty combinator. Give me some more time to experiment and I'll report back with concise examples of what I feel I'm unable to express with the current api.

johnynek commented 7 years ago

@olafurpg you are right that Doc.space == Doc.text(" ") it is just there to avoid any unclarity when reading (is there exactly one space there?) and to avoid allocating on a very common delimiter case.

Your example is pretty huge! Amazing. :)

The one thing I don't think you can currently express is this:

def fooBaz(a: Bar,
           b: Quux)

.nest is relative to the current indentation. To do the above, what you want what is called align in the haskell implementation of this (which goes beyond the paper). I don't think that is hard to add or breaks the model (just have a different way of updating the indentation context).

I'll look at just adding lineOrEmpty or custom flattening (more ambitious) on the plane I'm about to get on.

johnynek commented 7 years ago

I'm closing this for now since we added lineBreak and align which seem like the two main things we wanted but don't have. See #29