idank / bashlex

Python parser for bash
GNU General Public License v3.0
561 stars 94 forks source link

Parsing fails for if [[ -f "../build/tmp/dklm/klm_exports.h" ]] #43

Open sarvi opened 5 years ago

sarvi commented 5 years ago

Parsing fails for if [[ -f "../build/tmp/dklm/klm_exports.h" ]]

samlikins commented 1 year ago

In a Python interactive session with bashlex version 0.18:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bashlex
>>> bashlex.parse('if [[ -f "../build/tmp/dklm/klm_exports.h" ]]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
    parts = [p.parse()]
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
    tok = self.errorfunc(errtoken)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 548, in p_error
    raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token '-f' (position 6)
innovate-invent commented 12 months ago

Looks like there is a bug with the expected states when parsing COND_CMD tokens. The only valid next token is "COND_CMD" from "COND_START", but the type being returned is "WORD".

Even expressions like [[ $foo == $bar ]] are failing.

There are parserflags CONDCMD and CONDEXPR defined but it isn't clear what the intent or difference is between them.

https://github.com/idank/bashlex/blob/master/bashlex/tokenizer.py#L563-L564

https://github.com/idank/bashlex/blob/master/bashlex/tokenizer.py#L1159-L1160

idank commented 12 months ago

The only way to understand is by looking at the C code side by side.