kevinushey / sourcetools

Tools for reading, tokenizing, and parsing R code.
MIT License
77 stars 3 forks source link

Binding comment blocks to parse tree #7

Open jimhester opened 8 years ago

jimhester commented 8 years ago

One significant drawback to the current R parser is comments are completely discarded from the parse tree. They are included in the token information you get from getParseData() (subject to a length limitation), however there is no way to map from the tokens into the parse tree or vise versa.

Treating comments as bare strings directly in the parse tree works as long as they are not in the middle of an expression. For example

f1 <- function(x = 1 # a comment
  ) {
  x
}

I think the best approach is to attach the comment blocks to their closest expression as an attribute.

Go uses the concept of lead and line comments in its parser. (https://go.googlesource.com/go/+/master/src/go/parser/parser.go#301). Lead comments are all the comment lines until the next non-comment token.

# comment
# more comments
a + 1

Line comments are the comments until the line ends. (a + 1 # line comment).

Lead comments can be assigned to the next expression, line comments to the proceeding one.

The only issue with this approach is NULL expressions (as you cannot assign attributes to a NULL object). This can occur in practice

# package documentation
NULL

And potentially for default function arguments.

A way to get around that limitation would be to parse NULL as structure(NULL, class='null') in the AST and have a custom eval function which converts these objects to bare NULL before passing to base::eval().

These implementation details are only a suggestion. I just wanted to bring the issue of comments up so they could be addressed in some fashion.

Being able to manipulate the comments as part of the AST opens up many more possibilities for reformatting / programmatic transformations along the lines of gofmt. Particularly the rewrite rules in gofmt to do programmatic refactoring.

kevinushey commented 8 years ago

FWIW, although I plan to make sure that we can generate R parse trees from the sourcetools internal parse tree, I don't think the R parse tree will be the 'canonical' form, so we have a lot more freedom in how the parse tree is structured.

I like the idea of attaching comments to the 'top' of the associated expression though.