Attach leading and trailing comments to AST nodes

KvanTTT commented 6 years ago

Input

Parse the following code:

// leading
a // trailing

with turned on Comment = true option in ParserOptions:

var program = parser.ParseProgram(source, new ParserOptions(source) { Comment = true; });

After that, serialize the result AST with JsonConvert.

Expected

Leading and trailing should be attached to AST nodes. Check it on Esprima.org.

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "Identifier",
                "name": "a",
                "trailingComments": [
                    {
                        "type": "Line",
                        "value": " trailing",
                        "range": [
                            13,
                            24
                        ]
                    }
                ]
            },
            "leadingComments": [
                {
                    "type": "Line",
                    "value": " leading",
                    "range": [
                        0,
                        10
                    ]
                }
            ]
        }
    ],
    "sourceType": "script"
}

Actual

Comments are completely skipped.

{
  "Type": "Program",
  "Body": [
    {
      "Type": "ExpressionStatement",
      "Expression": {
        "Type": "Identifier",
        "Name": "a",
        "Loc": {}
      },
      "Loc": {}
    }
  ],
  "SourceType": "script",
  "Loc": {}
}

adams85 commented 2 years ago

A few thoughts for those who would make an attempt on implementing this: now there's everything in place to port CommentHandler from Esprima JS. But be aware that it'll be capable of an approximation at best. There's no way to precisely represent comments in the current AST model because it's simply not detailed enough: adding LeadingComments/TrailingComments to AST nodes is not enough to describe pieces of syntax like: async /*some comment*/ function f() { ... }. To handle such cases, AST nodes should also include tokens as child nodes. (I suggest checking out how the problem is solved in Roslyn.)

However, it's unlikely that proper comment handling will ever be possible in this lib because its main consumer is Jint, which wouldn't want to pay the price of a more detailed AST model. The best we can have is what CommentHandler does but it will be a half-assed solution. Which, of course, still could be useful in some use cases.

CharlieEriksen commented 2 years ago

I worked on this problem last week and concluded that this is fools' errand most likely, and shouldn't be a first-class citizen in the public API.

Consider a system where you want to tag javascript code, and have these markings show up in the AST. Here's the code with markings:

true; 
/* mark */ 
var /* mark */ x = /* mark */ "test"; /* mark */

The solution I came up with was:

public class TestMarkerVisitor : AstVisitor
{
    public static object Comment = "Comment";
    private readonly List<SyntaxComment> _comments;

    private TestMarkerVisitor(IReadOnlyList<SyntaxComment> comments)
    {
        _comments = comments;
    }

    public static void Parse(Node node, IReadOnlyList<SyntaxComment> comments)
    {
        if (comments.Count == 0) return;
        var visitor = new TestMarkerVisitor(comments);
        visitor.Visit(node);
    }

    private Node? previousNode = null;

    public override object? Visit(Node node)
    {
        // Check if we went past the start
        if (previousNode != null)
        {
           var commentsMatching = _comments.Where(currentComment => 
               currentComment.Location.Start.Compare(previousNode.Location.End) >= 0 // Previous node ends before the comment
            && currentComment.Location.End.Compare(node.Location.Start) <= 0        // Next node starts after the end of the comment
               ).ToList();
           if (commentsMatching.Count > 0)
           {
               node.SetAdditionalData(Comment, commentsMatching);
               _comments.RemoveAll(x=> commentsMatching.Contains(x));
           }

        }
        previousNode = node;
        return base.Visit(node);
    }

}

But I really don't think this is a great solution either. And esprima doesn't produce the correct result IMO: Example.

It makes the following mistakes, IMO:

mark1 is attached as trailing to the true literal. That makes no sense. In this case, that makes it more of a statement in the tree to me.
mark1 is also attached as leading to the VariableDeclaration. This is more correct in my opinion.
mark2 is attached as leading to the VariableDeclaration. It should be leading to the Identifier

I kinda think it's a can of worms and super use-case-dependent.

adams85 commented 2 years ago

You're right and this is what I was talking about in my previous comment. To be able to implement this correctly, we'd also need keywords and punctuators included in AST nodes. Which will probably not happen because of Jint. Maybe in a fork if someone desperately need this.

scgm0 commented 1 year ago

@adams85 I'm trying to implement "The toString() of 'class' and 'function' return SourceText" in jint to comply with ECMAScript specification. But currently the source code returned by Esprima .NET from the AST does not contain comments, and I was wondering if I could update this feature?

adams85 commented 1 year ago

Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).

However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see Node.Range). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.

scgm0 commented 1 year ago

Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).

However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see Node.Range). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.

In the test of test262, it does not return the real source code, what is needed is to synthesize JS code containing comments from AST.

adams85 commented 1 year ago

what is needed is to synthesize JS code containing comments from AST.

Once again, this is impossible with Esprima's current, ESTree-based AST.

However, you don't need to synthesize anything to implement toString for functions or classes. Please study the following code:

var input =
@"console.log(function f() { return /* comment 
*/ 'hello world' }.toString())";

var ast = new JavaScriptParser().ParseScript(input);
var func = ast.DescendantNodes().OfType<FunctionExpression>().First();

/*** This is how you can extract the original source text of an AST node ***/
var funcSourceText = input.Substring(func.Range.Start, func.Range.Length);

Console.WriteLine(funcSourceText);

sebastienros / esprima-dotnet