shapesecurity / shift-parser-js

ECMAScript parser that produces a Shift format AST
http://shift-ast.org/parser.html
Apache License 2.0
249 stars 28 forks source link

Generate AST with ranges and locations #354

Closed simondel closed 7 years ago

simondel commented 7 years ago

If I generate an AST of

if( 1 < 2) { 
  console.log('Hello'); 
}

It becomes

{"type":"Script","directives":[],"statements":[{"type":"IfStatement","test":{"type":"BinaryExpression","left":{"type":"LiteralNumericExpression","value":1},"operator":"<","right":{"type":"LiteralNumericExpression","value":2}},"consequent":{"type":"BlockStatement","block":{"type":"Block","statements":[{"type":"ExpressionStatement","expression":{"type":"CallExpression","callee":{"type":"StaticMemberExpression","object":{"type":"IdentifierExpression","name":"console"},"property":"log"},"arguments":[{"type":"LiteralStringExpression","value":"Hello"}]}}]}},"alternate":null}]}

If I put that in the online code generator it generates:

if(1<2){console.log("Hello")}

The AST does not contain any information about the linebreaks and indenting. This makes reporting on changes to the code very difficult for me. The generated code is also formatted which means that line 1, column 5 is a different token or may not even exist after generating JS code.

I don't want the code to change if I generate an AST and then generate code without making any change. How can I include ranges and locations in the parser so the generated code is the same as the original code?

An example of the features I want can be found in esprima if you enable Index-based range and Line and column-based

bakkot commented 7 years ago

The AST does not contain any information about the linebreaks and indenting.

Yes. This is an AST format. The "abstract" part means it discards information like whitespace, so that if (0); and if /*foo*/ (0);, which are the same program, have the same AST.

How can I include ranges and locations in the parser

You may be looking for parseScriptWithLocation / parseModuleWithLocation, as described in the readme. That said, the code generators available in shift-codegen cannot consume this information - but we'd certainly consider a PR to add one which could!

simondel commented 7 years ago

Thanks, I guess I looked over that! The most important part for me is knowing where the node is located so we can substitute it (using string replacement) with some generated code

bakkot commented 7 years ago

so we can substitute it (using string replacement) with some generated code

While that would work, if you were careful, the ideal way of replacing parts of a program is to extend CloneReducer (import { CloneReducer } from "shift-reducer") to replace whatever node you want replaced with a node representing the code you want it replaced with. That's much safer.

Of course, the fact that the code generators don't consume location information means that you wouldn't be able to get the original input with formatting back from the resulting AST. It sounds like that means it wouldn't work for your application. Still, thought I'd mention it.

simondel commented 7 years ago

Thanks for the tip! I don't think we'll use it. I'd prefer to generate a single node of an AST so we can report it like this: https://stryker-mutator.github.io/stryker-html-reporter/isolated-runner/IsolatedTestRunnerAdapterWorker.js.html (press one of the red numbers in the code or enable also Show killed mutants)

michaelficarra commented 7 years ago

Related enhancement request: https://github.com/shapesecurity/shift-java/issues/108