lydell / eslump

Fuzz testing JavaScript parsers and suchlike programs.
MIT License
57 stars 6 forks source link

Expose codeGen function #4

Closed RReverser closed 6 years ago

RReverser commented 6 years ago

Closes #3

RReverser commented 6 years ago

Hmm, this doesn't work as expected because token stream seems to insert extra semicolons which actually changes AST. What's the reason it does that instead of relying on random EmptyStatements generated by fuzzer?

lydell commented 6 years ago

The intention is not to insert extra semicolons, but to randomly end statements with semicolons or not. The problem with omitting semicolons are the complicated ASI rules. I guess this is the culprit:

https://github.com/lydell/eslump/blob/4282b09ba8a42c73e0804c4542b6dcb1d7c744f6/codegen.js#L55-L58

It probably adds an extra semicolon sometimes. (Now that I think about it, I'm not sure the ASI stuff even works with the newer class fields stuff or something.)

If you can find some hack to work around that it would be appreciated. Otherwise there's no need to expose codeGen at all, right?

RReverser commented 6 years ago

The intention is not to insert extra semicolons, but to randomly end statements with semicolons or not.

I guess it should be possible to move that directly to CustomFormattedCodeGen and override semiOp to either emit a semicolon or a newline. Something similar could be probably done for comments and spaces.

Otherwise there's no need to expose codeGen at all, right?

Well yeah, if the functionality of CustomTokenStream is merged directly into a formatter, we're back to the idea of just exposing that class.

RReverser commented 6 years ago

Quick experiment with overriding semiOp appears to do what we need for ASI

const formattedCodeGen = (() => {
    const { default: codeGen, FormattedCodeGen } = require('shift-codegen');
    const formattedCodeGen = new FormattedCodeGen();
    formattedCodeGen.semiOp = function () {
        if (Math.random() > 0.3) {
            return FormattedCodeGen.prototype.semiOp.apply(this, arguments);
        } else {
            return this.t("");
        }
    }
    return ast => codeGen(ast, formattedCodeGen);
})();

let ast = parseScript('var x = 1; y = 2; z = 3; t = 4; { p = 1; g = 2 } t = 10');
console.log(formattedCodeGen(ast));
var x = 1
y = 2;
z = 3
t = 4
{
  p = 1;
  g = 2;
}
t = 10

but obviously it required more thorough tests to make sure it doesn't break anything else.

lydell commented 6 years ago

I am now secretly hoping that you'll get so tired of all my hacks that you go and write the fuzzer of our dreams :)

RReverser commented 6 years ago

Thta was the initial plan, but eslump with reparsing works well enough for me not to bother about this that much :)

lydell commented 6 years ago

I think the better thing to do here is to implement the missing part of the Shift puzzle: "shift-codegen-fuzzed" or something. That prints an AST with random whitespace, comments, optional semicolons, optional colons and redundant parentheses.