idank / bashlex

Python parser for bash
GNU General Public License v3.0
552 stars 94 forks source link

Unexpected token \n when parsing bash *script* #23

Open konfilios opened 7 years ago

konfilios commented 7 years ago

I am trying to parse the following bash script (simplified example) and print the produced AST as JSON using bashlex 0.12:

function a {
    a;
}

# Comment

But it fails:

  File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 614, in parse
    part = _parser(s[index:], strictmode=strictmode).parse()
  File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 682, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
    return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
  File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
    tok = self.errorfunc(errtoken)
  File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
    p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 10)

A trivial workaround is to wrap the code in any other construct, the simplest being a set of curly braces. Then everything works just fine:

{
function a {
    a;
}

# Comment
}

Of course I can live with the workaround but I think it would be great if you took a look at it.

Thanks a lot for the great job you've done!

idank commented 7 years ago

Yeah this is silly that the library doesn't allow newlines. There's an old bug somewhere around here that has a fix for it. I just never got around to merging it in properly.

envp commented 5 years ago

@idank Is there a fix for this somewhere on the horizon? I'd like to use this library for some transpilers.

idank commented 5 years ago

There's some prior work at https://github.com/idank/bashlex/pull/8 that doesn't cover all cases. It might work for your use case.

samlikins commented 1 year ago

In a Python interactive session with the following setup:

$  python
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bashlex

Attempted the

>>> bashlex.parse('function a {\
...     a;\
... }\
... \
... # Comment')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
    parts = [p.parse()]
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
    tok = self.errorfunc(errtoken)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
    raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)

The original error doesn't match, but an error non-the-less. The workaround provided still works:

>>> bashlex.parse('{\
... function a {\
...     a;\
... }\
... \
... # Comment\
... }')
[ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(0, 9) word='{function'), WordNode(parts=[] pos=(10, 11) word='a'), WordNode(parts=[] pos=(12, 13) word='{'), WordNode(parts=[] pos=(17, 18) word='a')] pos=(0, 18)), OperatorNode(op=';' pos=(18, 19)), CommandNode(parts=[WordNode(parts=[] pos=(19, 21) word='}#'), WordNode(parts=[] pos=(22, 30) word='Comment}')] pos=(19, 30))] pos=(0, 30))]

The error seems to be as a result of the comment:

>>> bashlex.parse('function a {\
...     a;\
... }\
... ')
[FunctionNode(body=CompoundNode(list=[ReservedwordNode(pos=(11, 12) word='{'), ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(16, 17) word='a')] pos=(16, 17)), OperatorNode(op=';' pos=(17, 18))] pos=(16, 18)), ReservedwordNode(pos=(18, 19) word='}')] pos=(11, 19) redirects=[]) name=WordNode(parts=[] pos=(9, 10) word='a') parts=[ReservedwordNode(pos=(0, 8) word='function'), WordNode(parts=[] pos=(9, 10) word='a'), CompoundNode(list=[ReservedwordNode(pos=(11, 12) word='{'), ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(16, 17) word='a')] pos=(16, 17)), OperatorNode(op=';' pos=(17, 18))] pos=(16, 18)), ReservedwordNode(pos=(18, 19) word='}')] pos=(11, 19) redirects=[])] pos=(0, 19))]

Even if further newlines are added:

>>> bashlex.parse('function a {\
...     a;\
... }\
... \
... # Comment\
... \
... ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
    parts = [p.parse()]
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
    tok = self.errorfunc(errtoken)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
    raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)

Adding a statement after the comment doesn't seem to resolve this error:

>>> bashlex.parse('function a {\
...     a;\
... }\
... \
... # Comment\
... \
... a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
    parts = [p.parse()]
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
    tok = self.errorfunc(errtoken)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
    raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)

Moving the comment to the beginning didn't prevent an error, just provided a different one:

>>> bashlex.parse('# Comment\
... function a {\
...     a;\
... }')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 620, in parse
    ef.visit(parts[-1])
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/ast.py", line 38, in visit
    k = n.kind
AttributeError: 'str' object has no attribute 'kind'. Did you mean: 'find'?

Removing the comment all together seems to be the only prevention without the "workaround".