Closed GoogleCodeExporter closed 9 years ago
How does one distringuish # from non comment in bash? Searching for "bash
lexical grammar" on the web doesn't provide much so I had to wing it when
concocting the bash lexer.
Original comment by mikesamuel@gmail.com
on 30 Mar 2012 at 6:36
The answer you seek is in the man-page. On my mac (osx lion, version 3.2.8) it
says in the COMMENTS section:
a word beginning with # causes that word and all remaining characters on that line to be ignored.
I think that should be simple (check the first character of each token
left-to-right)
Original comment by Intransi...@gmail.com
on 31 Mar 2012 at 1:45
I don't understand that.
> A word is a sequence of characters considered as a single unit by GRUB. Words
are separated by metacharacters, which are the following plus space, tab, and
newline:
> { } | & $ ; < >
So "#x" in "${#x[@]}" would seem to be a word since it follows two
meta-characters '$' and '{'.
Maybe
> The ‘$’ character introduces variable expansion. The variable name to be
expanded may be enclosed in braces,
somehow establishes that variable names are not words, or '{' after a '$' is
not a metacharacter, but without a grammar, I can't tell.
https://gist.github.com/powerofazure/libbash/blob/901d9b9ce32fa493efa06f6eaa76bb
e6ad2e831f/bashast/bashast.g only treats '#' as a comment when it is preceded
by a space
COMMENT
: (BLANK|EOL) '#' ~('\n'|'\r')* {$channel=HIDDEN;}
;
and it has this little gem
//Because the comment token doesn't handle the first comment in a file if it's
on the first line, have a parser rule for it
flcomment
: BLANK? '#' commentpart*;
commentpart
: nqstr|BLANK|LBRACE|RBRACE|SEMIC|DOUBLE_SEMIC|TICK|LPAREN|RPAREN|LLPAREN|RRPAREN|PIPE|COMMA|SQUOTE|QUOTE|LT|GT;
which is probably a workaround due to an ANTLR limitation.
Perhaps the best rule is that a comment has to be at start of input or preceded
by a whitespace character.
Original comment by mikesamuel@gmail.com
on 1 Apr 2012 at 5:57
{ } are words themselves:
"
Note that unlike
the metacharacters ( and ), { and } are reserved words and must occur
where a reserved word is permitted to be recognized.
"
All that means is that they play by different rules
Regarding parameter expansion:
${parameter}
The value of parameter is substituted. The braces are required when
parameter is a positional parameter with more than one digit, or when
parameter is followed by a character which is not to be interpreted as
part of its name.
This suggests that it might be necessary to put corner case logic there for the
first character in the construct ${...}
But since it is explicitly stated:
${#name[subscript]} expands to the length of ${name[subscript]}.
It may make the most sense to manhack it
I think your last statement " comment has to be at start of input or preceded
by a whitespace character." is the simplest implementation (technically ${
#name} should fail)
Original comment by Intransi...@gmail.com
on 2 Apr 2012 at 7:11
Thanks. I think that works for the use of # by the C preprocessor.
http://gcc.gnu.org/onlinedocs/gcc-2.95.3/cpp_1.html#SEC3 suggests that this
definition is compatible with a default mode that tries to use a single token
definition to recognize both bash/python style comments and C preprocessor
directives.
> Preprocessing directives are lines in your program that start with `#'. ...
Whitespace is also allowed before and after the `#'.
Combined with
> A preprocessing directive cannot be more than one line in normal
circumstances. It may be split cosmetically with Backslash-Newline, but that
has no effect on its meaning. ...
suggests a default mode that treats
(^^|\s+)#(?:[^\n\r\\]|\\(?:[^\r]|\r\n?))*\\?
as a comment or pre-processing directive would work well for a variety of
languages.
Original comment by mikesamuel@gmail.com
on 3 Apr 2012 at 9:51
Fixed in revision 223
Original comment by mikesamuel@gmail.com
on 6 Jul 2012 at 9:29
Original issue reported on code.google.com by
Intransi...@gmail.com
on 2 Aug 2011 at 4:48