Tokenize input prior to parsing

Arcterus commented 6 years ago

Due to parsing directly on the input, we have a few issues where stuff like forx iny; do echo $x; done will functions the same as for x in y; do echo $x; done. Tokenizing the input should fix this issue.

Arcterus commented 6 years ago

An issue that has come up with this is a conflict between subshells, command substitution, and case statements.

Case statements allow stuff like:

case x in
    (y) echo hi;;
    x) echo hello;;
esac

This makes it difficult to match parentheses. We also cannot just tokenize the parentheses in all cases without making tokenization far more basic (as certain inputs are only valid in certain contexts regarding command substitution and quoting).

It would be easy to fix this if we just ban case statements with open parentheses, but of course we would then no longer be POSIX-compliant (additionally I am pretty sure that is the most common form). The other solutions I can think of at the moment require too much lookahead to be practical.

Arcterus commented 6 years ago

The solution I have gone with at the moment is to basically use some of the stuff from the lexer while parsing instead of before. This will probably require some care to make it fast, but it solves the issues with heredocs and ambiguous parentheses.

mesalock-linux / mesabox

Tokenize input prior to parsing #12