Open willwray opened 1 year ago
debugpy/launcher 37201 -- -m pcmd test.h
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
PyInt_FromLong not found.
test.h:3 error: Could not evaluate expression due to SyntaxError("around token 'x' type CPP_ID") (passed to evaluator: '0x')
PyInt_FromLong not found.
That's invalid input, and it did give a fairly good hint as to what's invalid about it.
Oops, I was overzealous in reducing the reproducer to less-than minimal... Here's a reproducer that actually preprocesses
#define CAT_(A,B)A##B
#define CAT(A,B)CAT_(A,B)
#define Ox 0x
#if CAT(Ox,0)
#endif
It appears that (passed to evaluator: '0x0')
is somehow lexed as CPP_INTEGER
followed by CPP_ID
where it should remain a preprocessor token
FYI, the error was hit using pcpp to do codegen with this preprocessing library https://github.com/willwray/IREPEAT in processing 'vertical' repetitions - here's one of the many problematic lines https://github.com/willwray/IREPEAT/blob/master/VREPEATx10.hpp#L11
(it works with gcc, clang, and the new conforming msvc preprocessor)
Also FYI, I'm looking at using pcpp to create an amalgamated header
(convenient for use on Compiler Explorer via a single #include<url>
)
I'm also evaluating if it can create nicer codegen than the native cpp's. It seems to create more empty lines than gcc and clang, but far fewer than msvc.
the PyInt_FromLong not found.
spam seems to be coming from the debugger - a red herring
pcpp lacks a pp-number
token (C++ link; same for C11 and C99)
so the tokenization is wrongly choosing CPP_INTEGER
> ppint = r'(((((0x)|(0X))[0-9a-fA-F]+)|(\d+))([uU][lL]|[lL][uU]|[uU]|[lL])?)'
> match = re.search(ppint,"0x")
> match.group()
: '0'
when it should choose pp-number
as the max-munch
> ppnum = r".?[0-9]([A-Za-z_][\w_]*|[eEpP][-+]|'[a-zA-Z0-9_])*"
> match = re.search(ppnum,"0x")
> match.group()
: '0x'
In phase 3 input is decomposed into preprocessing tokens, then phase 4 executes # directives and recurses back through 1,2,3...
Only in phase 7 are preprocessing tokens converted into tokens for translation.
pcpp only has one set of tokens
(I'm trying to hack in a CPP_NUMBER
token, no luck yet)
Help! Can't work out how to hack it.
Do the lextab.py
and parsetab.py
tables have to be regenerated? If so, how?
There's a comment on the in_production
variable:
in_production = 1 # Set to 0 if editing pcpp implementation!
When set to zero and my edits are still ignored - PLY
introspects the new CPP_NUMBER
token
then it seems to get lost at some point (maybe because the table files are used).
Related issue #71, also notes the incorrect parse as glued CPP_INTEGER
and CPP_ID
.
This could be a straightforward fix (still can't work out how to test it).
The current gcc lex.cc only processes CPP_NUMBER
.
This 2001 bugfix commit to the C preprocessor c-lex.c (c_lex): Remove CPP_INT, CPP_FLOAT cases
Don't use CPP_INT, CPP_FLOAT; CPP_NUMBER is enough
shows pp-number
is sufficient for preprocessor lexing.
Then, for evaluator.py
processing of #if
conditionals,
only "After all macro expansion and evaluation of ... ."
"Then the expression is evaluated as an integral constant expression"CPP_INTEGER
The current evaluator should correctly interpret any CPP_INTEGER
.
In other words, CPP_INTEGER
should be needed only for the evaluator
(and where the CPP_INTEGER
##CPP_ID
combo is a UDL user-defined literal)
Possible issues
pp-number
is a broad superset that can parse invalidcpp_avoid_paste
"avoid an accidental token paste"You may find the ply parser docs at https://www.dabeaz.com/ply/ of use on how it works and generates the precalculated table files.
Related issue in Boost.Wave :wave: BOOST_PP_CAT(1e, -1) pp-token bug fixed early 2006
Here's a reduced reproducer:
then
pcpp test.h
givesIt looks like leading decimal digits are eagerly stripped when parsed for the expression.