mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt
https://pkg.go.dev/mvdan.cc/sh/v3
BSD 3-Clause "New" or "Revised" License
7.14k stars 338 forks source link

syntax: ambiguity when parsing globs in indexed array elements #558

Open averms opened 4 years ago

averms commented 4 years ago

When trying to format the following file:

#!/bin/bash
files=([^_]*.txt)

shfmt fails with test.sh:2:9: ^ must follow an expression. I'm not sure what that means. The manual describes the syntax for pattern matching (globbing) here.

mvdan commented 4 years ago

Thanks for the report! This is probably an edge case in the lexer that we missed.

mvdan commented 4 years ago

Ah, the parser is getting confused because it sees name=([ and it thinks it's parsing name=([index]=value).

Unfortunately, arrays is one of the big areas where Bash syntax is ambiguous, and it can't be parsed statically without backtracking. See the mention of array indexes in https://github.com/mvdan/sh#caveats, for example.

I'm not sure what to do here. I definitely don't want to add backtracking. Parsing arrays is just a nightmare in general.

sotho commented 4 years ago

I have a similar issue and simplified it to this:

P=(*.tx[t])

gives an array of files (*.txt), but shfmt says:

"[x]" must be followed by =

Here, this isn't an index, but a file pattern.

averms commented 3 years ago

Maybe the "solution" sections from https://www.oilshell.org/blog/2016/10/20.html can give a solution? It says

Fortunately, there's a simple solution: always quote string literals used as array keys. The parser will then know that unquoted strings are variables. That is:

array[mystring]=x    # BAD: is it a string or a variable?
array['mystring']=x  # GOOD: it's a string

We can extend that to defining an array: if the characters inside the brackets aren't quoted they are a glob, otherwise they are treated as an index. Does this work?

mvdan commented 3 years ago

https://github.com/mvdan/sh#caveats already makes the same suggestion that oilshell makes, and I think it's generally sensible to write new shell code in a way that isn't ambiguous.

When it comes to files=([^_]*.txt), I'm not sure there's a way to truly make it non-ambiguous. A glob expression can contain quote characters just fine. I think this is one of the cases that will only be fixed by the parser supporting ambiguous input. Other people have brought up similar issues due to the lack of handling ambiguity, so I'm currently leaning towards supporting it. I'll let you know if/when that issue with a proposed implementation is created.

mvdan commented 3 years ago

Other people have brought up similar issues due to the lack of handling ambiguity, so I'm currently leaning towards supporting it. I'll let you know if/when that issue with a proposed implementation is created.

I collected my thoughts into a new issue here: https://github.com/mvdan/sh/issues/686

We can leave this issue open as it's the problem statement, not the proposed solution. But it can't be fixed until the new issue is implemented, I think.

gaelicWizard commented 2 years ago

I'm running in to what I think is the same issue:

lib/helpers.bash: lib/helpers.bash:615:34: [ must follow a name

EDIT: the note on your GitHub sponsorship page says suggestions are welcome, I'd like to contribute towards this refactor for using [ in arrays! (I'm trying to use file globbing to populate an array.) ♥

mvdan commented 2 years ago

@gaelicWizard thanks for the nudge and for the support :) You actually made me notice that we had a silly regression when printing errors in shfmt. I've sent https://github.com/mvdan/sh/pull/786 if you would like to give a review.

I can't seem to find what file is giving you that error. Could you please post a perma-link to one such file, or paste the bit of shell here?

gaelicWizard commented 2 years ago

@mvdan, here's the line it's throwing errors on:

plugins=("${BASH_IT}/enabled"/[[:digit:]][[:digit:]][[:digit:]]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash" "${BASH_IT}/$subdirectory/enabled/"{[[:digit:]][[:digit:]][[:digit:]]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash","${file_entity}.${suffix}.bash"})

Alsö throws errors on this variation:

plugins=("${BASH_IT}/enabled"/[0-9][0-9][0-9]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash" "${BASH_IT}/$subdirectory/enabled/"{[0-9][0-9][0-9]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash","${file_entity}.${suffix}.bash"})

Currently, I'm just using * in bash-it/bash-it#1934 to work around it.

Thanks!!