wgtdkp / wgtcc

A small C11 compiler
MIT License
763 stars 130 forks source link

Token spacing in preprocessor #24

Open huangguiyang opened 7 years ago

huangguiyang commented 7 years ago

Macro expansion is a tricky operation, fraught with nasty corner cases. I've tried some compilers (gcc, clang, lcc, tcc, 9cc, wgtcc, 8cc) for below's code snippet. Unfortunately, only gcc, clang and lcc got right.

#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)

The right output is

 + + - - + + = = =

not

++ -- ++ ===
wgtdkp commented 7 years ago

I hate space, stringize, glue and back slash!

wgtdkp commented 7 years ago

I doubt if both are correct, it is just that the preprocessor's dump function can't generate pretty readable code.

huangguiyang commented 7 years ago

The preprocessor must handle macro expansion carefully in order to get right column number etc. It's tricky. That explains why early C compilers don't contain column number in diagnostic messages.

huangguiyang commented 7 years ago

I think they are not identical token stream. + + means two separate + tokens to lexer, and ++ means increment operator to lexer.

wgtdkp commented 7 years ago

That right way of checking if the compiler handles column(or space) correctly is try below snippet:

#define PLUS +
#define EMPTY
#define f(x) =x=
#define STRINGIZE(x) #x
#define TEST(x) STRINGIZE(x)
const char* str = TEST(+PLUS -EMPTY- PLUS+ f(=));

str should be initiated by string literal "++ -- ++ ===" which are both generated by wgtcc and gcc.

huangguiyang commented 7 years ago

It's just one of numerous test cases.

wgtdkp commented 7 years ago

The -E option is just for dumping the preprocessed code for programmer. wgtcc 's dump function is so simple that the dumped code can't be compiled to get the same by compiling the .c file. But it can be fixed by simply inserting a space between two tokens.

huangguiyang commented 7 years ago

Actually, -E option may be used by other compilers. That's why the output must contain the line number information. In early days, the preprocessor runs as a separate pass. Of course, if you don't want your preprocessor to be a stand alone one, that's right.

wgtdkp commented 7 years ago

So it is the dump function that should be fixed. I'd rather go die.

huangguiyang commented 7 years ago

The key: where is the f**king document that describes these corner cases completely? :-(

wgtdkp commented 7 years ago

Lets suicide together :)