zzyengineer / google-code-prettify

Automatically exported from code.google.com/p/google-code-prettify
Apache License 2.0
0 stars 0 forks source link

bash incorrect syntax highlighting with # #165

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

try prettifying the expression "${#x[@]}"

What is the expected output?  What do you see instead?

The # should not be interpreted as a comment.  Currently everything after the 
hash shows up as comment ${#x[@]}

What version are you using?  On what browser?
Tried in chrome, FF, IE

Please provide any additional information below.
I suspect the BASH parser needs to account non-comment uses of the hash

Original issue reported on code.google.com by Intransi...@gmail.com on 2 Aug 2011 at 4:48

GoogleCodeExporter commented 9 years ago
How does one distringuish # from non comment in bash?  Searching for "bash 
lexical grammar" on the web doesn't provide much so I had to wing it when 
concocting the bash lexer.

Original comment by mikesamuel@gmail.com on 30 Mar 2012 at 6:36

GoogleCodeExporter commented 9 years ago
The answer you seek is in the man-page.  On my mac (osx lion, version 3.2.8) it 
says in the COMMENTS section:

    a word beginning with # causes that word and all remaining characters on that line to be ignored.

I think that should be simple (check the first character of each token 
left-to-right)

Original comment by Intransi...@gmail.com on 31 Mar 2012 at 1:45

GoogleCodeExporter commented 9 years ago
I don't understand that.  

> A word is a sequence of characters considered as a single unit by GRUB. Words 
are separated by metacharacters, which are the following plus space, tab, and 
newline:
> { } | & $ ; < >

So "#x" in "${#x[@]}" would seem to be a word since it follows two 
meta-characters '$' and '{'.  

Maybe

> The ‘$’ character introduces variable expansion. The variable name to be 
expanded may be enclosed in braces, 

somehow establishes that variable names are not words, or '{' after a '$' is 
not a metacharacter, but without a grammar, I can't tell.

https://gist.github.com/powerofazure/libbash/blob/901d9b9ce32fa493efa06f6eaa76bb
e6ad2e831f/bashast/bashast.g only treats '#' as a comment when it is preceded 
by a space

COMMENT
    :  (BLANK|EOL) '#' ~('\n'|'\r')* {$channel=HIDDEN;}
    ;

and it has this little gem

//Because the comment token doesn't handle the first comment in a file if it's 
on the first line, have a parser rule for it
flcomment
    :   BLANK? '#' commentpart*;
commentpart
    :   nqstr|BLANK|LBRACE|RBRACE|SEMIC|DOUBLE_SEMIC|TICK|LPAREN|RPAREN|LLPAREN|RRPAREN|PIPE|COMMA|SQUOTE|QUOTE|LT|GT;

which is probably a workaround due to an ANTLR limitation.

Perhaps the best rule is that a comment has to be at start of input or preceded 
by a whitespace character.

Original comment by mikesamuel@gmail.com on 1 Apr 2012 at 5:57

GoogleCodeExporter commented 9 years ago
{ } are words themselves:

"

              Note that unlike
              the metacharacters ( and ), { and } are reserved words  and  must  occur
              where  a reserved word is permitted to be recognized.

"

All that means is that they play by different rules

Regarding parameter expansion:

       ${parameter}
              The  value  of  parameter  is substituted.  The braces are required when
              parameter is a positional parameter with more than one  digit,  or  when
              parameter  is  followed by a character which is not to be interpreted as
              part of its name.

This suggests that it might be necessary to put corner case logic there for the 
first character in the construct ${...}

But since it is explicitly stated:

        ${#name[subscript]}  expands to the length of ${name[subscript]}.

It may make the most sense to manhack it

I think your last statement " comment has to be at start of input or preceded 
by a whitespace character." is the simplest implementation (technically ${ 
#name} should fail)

Original comment by Intransi...@gmail.com on 2 Apr 2012 at 7:11

GoogleCodeExporter commented 9 years ago
Thanks.  I think that works for the use of # by the C preprocessor.  
http://gcc.gnu.org/onlinedocs/gcc-2.95.3/cpp_1.html#SEC3 suggests that this 
definition is compatible with a default mode that tries to use a single token 
definition to recognize both bash/python style comments and C preprocessor 
directives.

> Preprocessing directives are lines in your program that start with `#'. ... 
Whitespace is also allowed before and after the `#'.

Combined with 

> A preprocessing directive cannot be more than one line in normal 
circumstances. It may be split cosmetically with Backslash-Newline, but that 
has no effect on its meaning. ...

suggests a default mode that treats

    (^^|\s+)#(?:[^\n\r\\]|\\(?:[^\r]|\r\n?))*\\?

as a comment or pre-processing directive would work well for a variety of 
languages.

Original comment by mikesamuel@gmail.com on 3 Apr 2012 at 9:51

GoogleCodeExporter commented 9 years ago
Fixed in revision 223

Original comment by mikesamuel@gmail.com on 6 Jul 2012 at 9:29