Closed Sei-Lisa closed 3 years ago
Here's another problem of using Python expressions for evaluating:
#if -3 < -2 < -1
"bad"
#else
"good"
#endif
In mcpp and gcc, this outputs "good" (because -3 < -2
gives 1, and 1 < -1
is false); in pcpp it outputs "bad" (because Python's operator chaining feature causes that to be interpreted as (-3 < -2) and (-2 < -1)
which is true).
So you're donating a proper expression evaluator which is standards conforming? If so, amazing, thank you. But I'll need to kick reviewing and integrating it down the line a bit as it's a non trivial amount of work to do, and my plate is currently very full.
Is there any issue blocking you from further progress, or are you currently good and can wait a bit?
Thank you! I'm good, it's usable as is so I can wait. I'd just like to understand the rationale behind tests.passthru.test10
, which is the only one in your test suite that this evaluator fails, so I know whether to fix it or not. If it's just a side effect of using eval, and the test needed to be written like that in order to pass, then all is good, and the only necessary change is will be modifying the test.
I forgot. There's a point that is still half-baked in the new evaluator, namely the treatment of Unicode char constants. I guess that the expected thing to do is to treat the input as UTF-8 bytes; that wouldn't be a problem for Python 2 but keeping dual compatibility might be.
Also, I haven't tried to keep compatibility with Python 3, mainly because I can't test. Not sure what version you're targeting, but with 3.5.3 PCPP isn't working for me.
I'd just like to understand the rationale behind tests.passthru.test10, which is the only one in your test suite that this evaluator fails, so I know whether to fix it or not.
We strip the L for the Python hack evaluator. With a real evaluator, it is no longer necessary, nor desirable.
Thanks for the heads up regarding UTF-8 constants and Python 3 compatibility. Indeed some work remains on porting over this donated evaluator.
Two new edits to the evaluator:
https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/a1b5f1bb458a8c9cb46ece02cdd33e1c85dc19f6 increases the black-boxing of the Evaluator class, to return a normal Python integer and not a wrapped integer. This is minor.
https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/4a9cc9e20f61071d7e5358f5468020819e4f3b3b adds support for comment passthrough, which I forgot about.
Both are in current master, so the link to the file in the OP still has the changes.
New bugfix: https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/138d042b2e52f896c5d79a18f20afc188fce937e
With that, the evaluator passes the Wave expressions suite (t_4_*.cpp
) and the expressions-related tests in the mcpp general functionality suite as modified by Wave (t_5_*.cpp
).
Very cool. Looking forward to looking into it soon.
Finally a yacc based expression evaluator is implemented. Only took me nearly two years to get it done.
Thanks a lot for the latest fixes; the project is now suitable to replace the external preprocessor in my optimizer. The only problem for my usage was the use of
eval
for evaluating the expressions, which I have solved by overriding the evaluator and creating my own. This is necessary because I provide an online version of my LSL optimizer, and even though PCPP doesn't allow arbitrary expressions to get through, it is still easy to DoS the server by filling the memory of the process or making it waste CPU (try e.g.#if ((1,1)*40000)*40000
or#if 4**4**4**4
).Before I wrote it, I first considered using the AST module as a substitute of eval() but that seems to also have issues (per the comment by Antti Haapala). Worth a look, in any case, about the dangers of
eval
andast
.The implementation passes all tests in your suite except for one,
tests.passthru.test10
. The cause is obvious, but the rationale for expecting that result is unclear to me, so I haven't attempted to fix it yet. Should the suffix of numbers always be removed in passthrough mode? If so, why?I've had to work around another problem, which is the incomplete tokenisation performed by PCPP of multi-symbol sequences, i.e. those grouped under the punctuator nonterminal in the standard, including digraphs. I would have reported it as an issue, as it is affecting other areas (such as token pasting, as mentioned in #19), but not being sure whether you desire to fix that, I hesitated. If you're willing to fix it, the evaluator code can be simpler. One major consequence of the incomplete tokenisation, compatibility-wise, is that the digraphs
%:%:
and%:
(equivalent to##
and#
respectively) are currently unsupported. I don't think my users will miss them, though, as the target language is not even C or C++.In the README you mention that donations of a yacc-based evaluator are welcome. Well, mine is not yacc-based, but recursive-descendent; still, in case you want to incorporate it to PCPP, feel free to do so. The file is here: https://github.com/Sei-Lisa/LSL-PyOptimizer/blob/master/cpreproc.py; the relevant (non-)license starts in line 55. The fragments that are under the GPL are the ones specific to the optimizer: all methods in the
Preproc
class exceptevalexpr
.