Open KvanTTT opened 6 years ago
A few thoughts for those who would make an attempt on implementing this: now there's everything in place to port CommentHandler
from Esprima JS. But be aware that it'll be capable of an approximation at best. There's no way to precisely represent comments in the current AST model because it's simply not detailed enough: adding LeadingComments
/TrailingComments
to AST nodes is not enough to describe pieces of syntax like: async /*some comment*/ function f() { ... }
. To handle such cases, AST nodes should also include tokens as child nodes. (I suggest checking out how the problem is solved in Roslyn.)
However, it's unlikely that proper comment handling will ever be possible in this lib because its main consumer is Jint, which wouldn't want to pay the price of a more detailed AST model. The best we can have is what CommentHandler
does but it will be a half-assed solution. Which, of course, still could be useful in some use cases.
I worked on this problem last week and concluded that this is fools' errand most likely, and shouldn't be a first-class citizen in the public API.
Consider a system where you want to tag javascript code, and have these markings show up in the AST. Here's the code with markings:
true;
/* mark */
var /* mark */ x = /* mark */ "test"; /* mark */
The solution I came up with was:
public class TestMarkerVisitor : AstVisitor
{
public static object Comment = "Comment";
private readonly List<SyntaxComment> _comments;
private TestMarkerVisitor(IReadOnlyList<SyntaxComment> comments)
{
_comments = comments;
}
public static void Parse(Node node, IReadOnlyList<SyntaxComment> comments)
{
if (comments.Count == 0) return;
var visitor = new TestMarkerVisitor(comments);
visitor.Visit(node);
}
private Node? previousNode = null;
public override object? Visit(Node node)
{
// Check if we went past the start
if (previousNode != null)
{
var commentsMatching = _comments.Where(currentComment =>
currentComment.Location.Start.Compare(previousNode.Location.End) >= 0 // Previous node ends before the comment
&& currentComment.Location.End.Compare(node.Location.Start) <= 0 // Next node starts after the end of the comment
).ToList();
if (commentsMatching.Count > 0)
{
node.SetAdditionalData(Comment, commentsMatching);
_comments.RemoveAll(x=> commentsMatching.Contains(x));
}
}
previousNode = node;
return base.Visit(node);
}
}
But I really don't think this is a great solution either. And esprima doesn't produce the correct result IMO: Example.
It makes the following mistakes, IMO:
true
literal. That makes no sense. In this case, that makes it more of a statement in the tree to me. VariableDeclaration
. This is more correct in my opinion. VariableDeclaration
. It should be leading to the Identifier
I kinda think it's a can of worms and super use-case-dependent.
You're right and this is what I was talking about in my previous comment. To be able to implement this correctly, we'd also need keywords and punctuators included in AST nodes. Which will probably not happen because of Jint. Maybe in a fork if someone desperately need this.
@adams85 I'm trying to implement "The toString() of 'class' and 'function' return SourceText" in jint to comply with ECMAScript specification. But currently the source code returned by Esprima .NET from the AST does not contain comments, and I was wondering if I could update this feature?
Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).
However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see Node.Range
). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.
Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).
However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see
Node.Range
). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.
In the test of test262, it does not return the real source code, what is needed is to synthesize JS code containing comments from AST.
what is needed is to synthesize JS code containing comments from AST.
Once again, this is impossible with Esprima's current, ESTree-based AST.
However, you don't need to synthesize anything to implement toString
for functions or classes. Please study the following code:
var input =
@"console.log(function f() { return /* comment
*/ 'hello world' }.toString())";
var ast = new JavaScriptParser().ParseScript(input);
var func = ast.DescendantNodes().OfType<FunctionExpression>().First();
/*** This is how you can extract the original source text of an AST node ***/
var funcSourceText = input.Substring(func.Range.Start, func.Range.Length);
Console.WriteLine(funcSourceText);
Input
Parse the following code:
with turned on
Comment = true
option inParserOptions
:After that, serialize the result AST with
JsonConvert
.Expected
Leading and trailing should be attached to AST nodes. Check it on Esprima.org.
Actual
Comments are completely skipped.