idank / bashlex

Python parser for bash
GNU General Public License v3.0
550 stars 94 forks source link

Parsing scripts with arrays #84

Open samlikins opened 1 year ago

samlikins commented 1 year ago

Attempting to parse a script with array declaration fails upon encountering the opening set mark (ie: ().

The following bashlexinformation was provided by pip:

$ pip show bashlex
Name: bashlex
Version: 0.18
Summary: Python parser for bash
Home-page: https://github.com/idank/bashlex.git
Author: Idan Kamara
Author-email: i@idank.me
License: GPLv3+
Location: /home/user/.local/lib/python3.10/site-packages
Requires:
Required-by:

In a Python interactive session with the following setup:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bashlex

Running the bashlex.parse function with the string declare -a CMDS=() produces the following output:

>>> bashlex.parse('declare -a CMDS=()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
    parts = [p.parse()]
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
    tok = self.errorfunc(errtoken)
  File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 548, in p_error
    raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token '(' (position 16)

When removing the round brackets it succeeds:

>>> bashlex.parse('declare -a CMDS')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 7) word='declare'), WordNode(parts=[] pos=(8, 10) word='-a'), WordNode(parts=[] pos=(11, 15) word='CMDS')] pos=(0, 15))]

It's independent of the declare keyword:

>>> bashlex.parse('CMDS=()')
bashlex.errors.ParsingError: unexpected token '(' (position 5)

The error occurs when appending to the array as well:

>>> bashlex.parse('CMDS+=("init")')
bashlex.errors.ParsingError: unexpected token '(' (position 6)

Parsing parenthesis is not by itself the issue:

>>> bashlex.parse('(env)')
[CompoundNode(list=[ReservedwordNode(pos=(0, 1) word='('), CommandNode(parts=[WordNode(parts=[] pos=(1, 4) word='env')] pos=(1, 4)), ReservedwordNode(pos=(4, 5) word=')')] pos=(0, 5) redirects=[])]

The lexer seems to recognize arrays as WordNodes:

>>> bashlex.parse('ARRAY[1]=init')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 13) word='ARRAY[1]=init')] pos=(0, 13))]
>>> bashlex.parse('echo ${ARRAY[*]}')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 4) word='echo'), WordNode(parts=[ParameterNode(pos=(5, 16) value='ARRAY[*]')] pos=(5, 16) word='${ARRAY[*]}')] pos=(0, 16))]
>>> bashlex.parse('unset ARRAY[1]')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 5) word='unset'), WordNode(parts=[] pos=(6, 14) word='ARRAY[1]')] pos=(0, 14))]

It just seems to have issues recognizing array sets when performing assignments.

samlikins commented 1 year ago

It seems a similar issue was previously reported #24.

AndrewVallette commented 1 year ago

Yes, I have encountered the same issue. It's a shame since almost all of the bash files I work with use arrays.