Open turbolent opened 4 years ago
I think we may have to capture the comments at a further granular level. Because a comment can exist at any place where whitespace can occur, not only at declarations/statements/expressions level, and we would have to retain those as well.
e.g. let /*Im a comment*/ x: Int
A solution would be to:
Roslyn (C#) compiler follows a similar approach: https://github.com/dotnet/roslyn/blob/main/docs/wiki/Roslyn-Overview.md#syntax-trivia
This way, it also divides the whitespace/comments between two consecutive non-trivia tokens as leading and trailing trivia (rather than having leading trivia only, or trailing trivia only), so that the comments are attached to the correct node/token. And moving that node around (code organizing/formatting), would also move the comment along with it.
e.g: See the bellow comments, which is the 'usual' way followed by most people.
// (1) The comment immediately before the decl is usually about 'x' decl
let x: Int // (2) A comment on the same line also talks about 'x' decl (or about `Int`)
// (3) Similarly, this is a comment about 'y' decl
let y: Int // (4) This is also a comment about 'y' decl
In the above, comments (1) and (2) would be a part of x-var-decl. Moving let x: 5
around would also move the comments (1) and (2) along with it.
Similarly, (3) and (4) would be a part of y-var-decl.
hi @SupunS Great comment, agree with your proposed solution. Have you started work on this issue? I ask this because I had actually worked on this but never got around to completing it. I was planning to put up a PR soon, so would just like to confirm if you are also working on it so that we can avoid duplicate work!
@lkadian I haven't done or started any work on this. Great to hear that you have already been working on it! :clap: Would love to have a PR!
Looping in @MaxStalker.
This issue has been a bit trickier than I thought as it touches a lot of parts of the parsing code, and as a result it affects a lot of the tests. Posting some info on how I am implementing this, so that if anything seems completely off, someone can let me know! Also to mention that it's still in progress
The original issue mentions retaining trivia for declarations, statement and expressions, but as @SupunS pointed out it looks like we may need to capture leading/trailing trivia for each token. Basically, we'll have a full syntax tree where we hold every token, and each token will know about its leading/trailing trivia.
Currently we hold trivia as regular tokens, but we need these to be a separate entity. Then when scanning, we will emit each token accompanied with it's leading and trailing trivia:
type Token struct {
Type TokenType
Value interface{}
ast.Range
LeadingTrivia []Trivia
TrailingTrivia []Trivia
}
Rules for leading/trailing trivia (taken from Swift):
Thanks, @lkadian for taking this forward and for the great progress! Sorry for not being in touch recently - I somehow miss updates from this thread, probably have drowned with other stuff.
Currently we hold trivia as regular tokens, but we need these to be a separate entity. Then when scanning, we will emit each token accompanied with it's leading and trailing trivia:
type Token struct { Type TokenType Value interface{} ast.Range LeadingTrivia []Trivia TrailingTrivia []Trivia }
Rules for leading/trailing trivia (taken from Swift):
- A token owns all of its trailing trivia up to, but not including, the next newline character.
- Looking backward in the text, a token owns all of the leading trivia up to and including the first contiguous sequence of newlines characters
This sounds like a pretty good solution to me! :clap:
- [ ] Restructure parser to work with above changes and support all tokens in the AST (full syntax tree)
- [ ] Fix broken parser tests
- [ ] Add parser tests Currently not all tokens are stored in the AST as they are not needed in later stages. But if we want this sort of "full syntax tree", we will need to keep all tokens for each syntax node.
That's a good point. I think we'll have to retain the full syntax tree, if we need to retain all the comments.
In the future, we could add an option to switch capturing the full syntax-tree on and off, depending on the use-case. For example, we need the full syntax tree and the trivia, when using in the IDE/playground during the development time, but don't really need them when parsing and running the code on-chain.
Maybe we can keep a flag for now, but have it 'on' by default, and later we can make it configurable. I'm not sure how big would that change (adding a flag) be. Another option is to add the flag altogether as a separate change, so that current changes don't get too big.
We can split this into 2 milestones:
Issue To Be Solved
Currently the AST only retains comments for declarations if they are docstrings (
///
line comments, a and/**
block comments).To be able to use the AST for formatting, it must retain also non-docstring comments.
Retain comments for:
Definition of Done