haskell / pretty

Haskell Pretty-printer library
Other
69 stars 30 forks source link

Modify structure of an existing Doc #27

Closed l-monnier closed 8 years ago

l-monnier commented 9 years ago

Currently, the Doc constructors are not exported. It’s therefore not possible to reorganise the structure of an existing Doc.

Sometimes it could be handy. In my case, I'm parsing an AST to SQL. A query with a join clause is generally written like this:

SELECT *
FROM Table1
INNER JOIN Table2
ON Table1.table2Id = Table2.table2Id

The join starts from "Table1". So, the code haskell code will look something like (this isn't real code but just an example, to illustrates what happens with regards to pretty parsing):

parseFrom :: From -> Doc
parseFrom (From join) = "FROM" <+> parseJoin join

parseJoin :: Join -> Doc
parseJoin (Join table1 table2 clause) =
    parseTable table1 $+$ ("INNER JOIN" <+> parseTable table2) $+$ parseClause clause

This code will generate the following (when included in the whole):

SELECT *
FROM Table1
     INNER JOIN Table2
     ON Table1.table2Id = Table2.table2Id

To get the inner join aligned as desired, a first approach would be to change the above code. However, (correct me if I’ve missed something!) I don't see another solution than parsing the join directly in the parseFrom function, which would reduce the modularity (imagine we need to parse a join somewhere else than in a FROM).

If it would be possible to have access to the Doc constructors one could do something like this:

-- | Return the first document contained in a document.
--   
--   It can be used in combination with 'docTail' to modify the composition
--   of a document.
docHead :: Doc -> Doc
docHead Empty                = Empty
docHead (NilAbove d)         = d
docHead d@(TextBeside _ _ _) = d
docHead (Nest _ d)           = d
docHead (Union _ d)          = d
docHead NoDoc                = NoDoc
docHead (Beside d _ _)       = d
docHead (Above d _ _)        = d 

-- | Return the second document contained in a document if existing.
--   Otherwise, return 'Nothing'.
--   
--   It can be used in combination with 'docHead' to modify the composition
--   of a document.
docTail :: Doc -> Maybe Doc
docTail (Union _ d)        = Just d
docTail (Beside _ _ d)     = Just d
docTail (Above _ _ d)      = Just d
docTail _                  = Nothing

Now, to get the desired result the parseFrom function becomes:

parseFrom :: Join -> Doc
parseFrom (From join) =
"FROM" <+> docHead doc $+$ fromMaybe empty (docTail doc)
where
    doc = parseJoin join

Of course, rather than providing access to the Doc constructor, an alternative would be to include functions like docHead and docTail in the PrettyPrint library.

dterei commented 8 years ago

Sorry for late reply! Can you explain the alignment problem further sorry. I don't understand where the nesting is coming from in your example.

l-monnier commented 8 years ago

Thank you for your reply. I Hope I understood correctly your question... In the below output:

SELECT *
FROM Table1
            INNER JOIN Table2
            ON Table1.table2Id = Table2.table2Id

the nesting come from the fact that 'Table 1' is "above" 'INNER JOIN Table2'.

To avoid the nesting, I'd need 'FROM Table1' to be "above" 'INNER JOIN Table2". However, it doesn't match the parsing strategy which parses the "FROM" and it's content in separated ways. So if we take the steps of the two following functions:

parseFrom :: From -> Doc
parseFrom (From join) = "FROM" <+> parseJoin join

parseJoin :: Join -> Doc
parseJoin (Join table1 table2 clause) =
     parseTable table1 $+$ ("INNER JOIN" <+> parseTable table2) $+$ parseClause clause

we get for parseJoin:

"Table1" $+$ "INNER JOIN Table2" $+$ "ON Table1.table2Id = Table2.table2Id"

and then for parseFrom:

"FROM" <+> ("Table1" $+$ "INNER JOIN Table2" $+$ "ON Table1.table2Id = Table2.table2Id")

Which means that 'INNER JOIN Table2' will be aligned under "Table1". What I wish is:

("FROM" <+> "Table1") $+$ "INNER JOIN Table2" $+$ "ON Table1.table2Id = Table2.table2Id")

Note: in the meanwhile, I realized that the above is achievable with Wadler's parser. Nevertheless, I still believe that the question if the Doc type's constructors should be public rather than private is relevant.

dterei commented 8 years ago

If you can achieve this with Wadler's parser, then for the moment I'm going to leave it unresolved. I'm cautious of making the constructors public for the moment as they've been private for a decade now and this is the first time this has come up. I'd prefer to keep things simpler if possible.

I'd be more open to including something like docHead and docTail if you are interested in submitting a patch.

l-monnier commented 8 years ago

It's perfectly understandable. I will submit a patch in a few days. Thank you for your answers.

l-monnier commented 8 years ago

I've implemented the patch, but its behavior is not really satisfying. Here are docHead properties:

docHead (empty <> d2) == docHead d2
docHead (d1 <> empty) == docHead d1
docHead (d1 <> d2) == d1 -- If d1 and d2 are non-empty.

The reason comes from the implementation of the combinators themselves, where Empty has absolutely no effects, not even in terms of history. Therefore, when implementing docHead it's not possible to know if a combination with an empty document has previously occurred or not. To solve this, functions such as beside_ should be changed to have empty documents stored (and then ignored at rendering).

This is well beyond of what should be this patch. On the other hand, providing the functions such as they are, with a behavior rather difficult to predict is not satisfactory either. Thus, I believe the topic is definitely closed. If you think otherwise, don't hesitate to let me know.