shevek / jcpp

The C Preprocessor as a Java library
http://www.anarres.org/projects/jcpp/
Apache License 2.0
106 stars 36 forks source link

Bad Token's line and column when code line is broken with backslash #14

Open grzegorz8 opened 10 years ago

grzegorz8 commented 10 years ago

When we define a multi-line macro, such as:

6: #define THREE 1 \
7:    + \
8:    2

we could expect that calling token.getLine() for Token representing number "2" would return line 8, but surprisingly the entire define definition is regarded as one-line preprocessor directive, so the result is 6.

The tokens list representing macro THREE:

[HASH@6,0]:"#"
[IDENTIFIER@6,1]:"define"
[IDENTIFIER@6,8]:"A"
[(@6,9]:"("
[IDENTIFIER@6,10]:"a"
[,@6,11]:","
[IDENTIFIER@6,13]:"b"
[)@6,14]:")"
[IDENTIFIER@6,16]:"a"
[WHITESPACE@6,17]:"    "
[+@6,21]:"+"
[WHITESPACE@6,22]:"    "
[IDENTIFIER@6,26]:"b"
[NL@6,27]:"

I'm aware that tokens list in Macro is not public, but still line and column numbers should be correct.

shevek commented 10 years ago

mm, this is presumably due to a weirdness in the cpp spec where backslash-newline is elided and reinserted after the line. We use JoinReader to elide the \ sequences. In order to fix this, it's likely that we will have to merge JoinReader into LexerSource.

Hrrrrnnnng. OK, I accept this as a good bug, but I'll have to think about how to fix it!

grzegorz8 commented 10 years ago

What's more, if we have a string broken by backslashes into multi-line token, the location of the following tokens is wrong. Example:

4: char *string = "a \
5:     b \
6:     c";

The expected semicolon's location is (6, 8) but actual is (4, 31).

shevek commented 10 years ago

You're quite right. I need to merge JoinReader into LexerSource, but how one knows whether to unget a \ is a little beyond me at this time in the morning. Suggestions taken, else I'll get there soon enough. :-) I appreciate the test cases.