vorpaljs / bash-parser

Parses bash into an AST
https://vorpaljs.github.io/bash-parser-playground/
MIT License
200 stars 34 forks source link

Error in double-quoting.js when handling quoted `)` and `$` characters #58

Open cetfor opened 6 years ago

cetfor commented 6 years ago

Hey, awesome job on bash-parser. I'm finding this extremely useful in my own work.

I've run into an issue when parsing ) and $ characters in quoted strings. Let me give an example to help explain.

Consider the following line:

command="$(echo "$input" | sed -e "s/^[ \t]*\([^ \t]*\)[ \t]*.*$/\1/g")"

This line is a part of a larger, more complex bash script I'm analyzing. There are actually two (2) issues here.

  1. At column 64, the $ causes an error.
  2. At column 55, the ) causes an error.

Removing the $ leads the parser to believe the ) at column 55 is the closing parentheses rather than a string literal.

Both issues boil down to line 50 in double-quoting.js, where reducers is undefined.

I understand this is a strange case since the issue here exists in parsing a string pattern. I'm honestly not too sure how to go about addressing this yet. If I come up with anything I'll make sure to comment here. Also, if you have any ideas please share them, I'd be happy to work on a fix as soon as I think of the best way to handle these cases.

cetfor commented 6 years ago

Hey @parro-it, I'm currently working on a fix for this issue and hope to get a PR in today. FYI, I've resolved the "build failures" due to test-loc.js which is just a silly double-quoting issue that throws a ton of errors - nothing critical. I hope we can ping back and forth on the PR and get something implemented to fix this. Thanks!

cetfor commented 6 years ago

As far as the issue goes that causes Error: TypeError: Cannot read property 'doubleQuoting' of undefined it appears line 38 of expansion-start.js, the method previousReducer can take a third argument of reducers, but it's not set. See expansion-parameters.js line 31 for that definition.

Changing: return state.previousReducer(state, [char].concat(source));

to: return state.previousReducer(state, [char].concat(source), reducers);

Resolves the issue Error: TypeError: Cannot read property 'doubleQuoting' of undefined - all unit tests run through npm test continue to pass with this change.

cetfor commented 6 years ago

With the above change in place, and using the following script:

const ast = parse('command="$(echo "$input" | sed -e "s/^[ \t]*\([^ \t]*\)[ \t]*.*$/\1/g")"');
console.log(JSON.stringify(ast, null, 2));

I get the following error:

/Users/john/Documents/cetfor/bash-parser/src/index.js:52
                throw new Error(err.stack || err.message);
                ^

Error: TypeError: Cannot read property 'char' of undefined
    at expansion.map.xp (/Users/john/Documents/cetfor/bash-parser/src/utils/tokens.js:80:21)
    at Array.map (<anonymous>)
    at tokenOrEmpty (/Users/john/Documents/cetfor/bash-parser/src/utils/tokens.js:76:45)
    at start (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/tokenizer/reducers/start.js:51:18)
    at tokenizer (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/tokenizer/index.js:193:13)
    at tokenizer.next (<anonymous>)
    at Object.next (/Users/john/Documents/cetfor/bash-parser/node_modules/iterable-lookahead/index.js:51:24)
    at Object.next (/Users/john/Documents/cetfor/bash-parser/node_modules/map-iterable/index.js:33:30)
    at filterIterator (/Users/john/Documents/cetfor/bash-parser/node_modules/filter-iterator/index.js:6:12)
    at filterIterator.next (<anonymous>)
    at parse (/Users/john/Documents/cetfor/bash-parser/src/index.js:52:9)
    at Object.<anonymous> (/Users/john/Documents/cetfor/bash-parser/ex.js:9:13)
    at Module._compile (module.js:652:30)
    at Object.Module._extensions..js (module.js:663:10)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)
    at Function.Module.runMain (module.js:693:10)
    at startup (bootstrap_node.js:191:16)
    at bootstrap_node.js:612:3

The issue here exists in tokens.js line 80 in the tokenOrEmpty function. In this case, xp.loc has no end property!

I added the following dirty patch to see if I can continue debugging the issue:

// console.log('aaa', {token: state.loc, xp: xp.loc});
// return Object.assign({}, xp, {loc: {
//  start: xp.loc.start.char - state.loc.start.char,
//  end: xp.loc.end.char - state.loc.start.char
// }});
if(xp.loc.hasOwnProperty('end')) {
    return Object.assign({}, xp, {loc: {
        start: xp.loc.start.char - state.loc.start.char,
        end: xp.loc.end.char - state.loc.start.char
    }});
} else {
    return Object.assign({}, xp, {loc: {
        start: xp.loc.start.char - state.loc.start.char
    }});
}

This isn't ideal - just a way to contiue moving towards the issue. When I re-run the example with this "patch" I get the following error:

/Users/john/Documents/cetfor/bash-parser/src/index.js:50
                        throw err;
                        ^

SyntaxError: Unclosed "
    at tk (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/rules/syntaxerror-oncontinue.js:7:10)
    at Object.next (/Users/john/Documents/cetfor/bash-parser/node_modules/map-iterable/index.js:35:18)
    at Object.next (/Users/john/Documents/cetfor/bash-parser/node_modules/map-iterable/index.js:33:30)
    at Object.lex (/Users/john/Documents/cetfor/bash-parser/src/shell-lexer.js:6:31)
    at lex (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/built-grammar.js:317:27)
    at Parser.parse (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/built-grammar.js:330:26)
    at parse (/Users/john/Documents/cetfor/bash-parser/src/index.js:34:22)
    at setCommandExpansion (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/rules/command-expansion.js:17:21)
    at tokensUtils.setExpansions.token.expansion.map.xp (/Users/john/Documents/cetfor/bash-parser/src/modes/posix/rules/command-expansion.js:38:12)
    at Array.map (<anonymous>)

I uncommented the console.log({value, xp}) statement on line 55 of command-expansion.js https://github.com/vorpaljs/bash-parser/blob/6fa4ca52fbb92f9eb6b94e2e8c16768210b7af3d/src/modes/posix/rules/command-expansion.js#L55 to see what the command was being seen as, and I get this:

{ loc: { start: 9, end: 50 },
  command: 'echo "$input" | sed -e "s/^[ \t]*([^ \t]*',
  type: 'command_expansion' }
{ loc: { start: 1, end: 6 },
  parameter: 'input',
  type: 'parameter_expansion' }

So for some reason, the command is being truncated where the \) is. I would expect this command to be echo "$input" | sed -e "s/^[ \t]*\([^ \t]*\)[ \t]*.*$/\1/g". Perhaps state.escaping is not being set when this occurs?

I'm starting to think all of this happens within the lexerPhases. Any insight would be greatly appreciated.

daniel-sc commented 4 years ago

Hi there, stumbled upon this problem as well (https://github.com/daniel-sc/bash-shell-to-bat-converter/issues/13) any updates/progress here?