ned14 / pcpp

A C99 preprocessor written in pure Python
Other
215 stars 39 forks source link

Replace eval by a more secure alternative #20

Closed Sei-Lisa closed 3 years ago

Sei-Lisa commented 5 years ago

Thanks a lot for the latest fixes; the project is now suitable to replace the external preprocessor in my optimizer. The only problem for my usage was the use of eval for evaluating the expressions, which I have solved by overriding the evaluator and creating my own. This is necessary because I provide an online version of my LSL optimizer, and even though PCPP doesn't allow arbitrary expressions to get through, it is still easy to DoS the server by filling the memory of the process or making it waste CPU (try e.g. #if ((1,1)*40000)*40000 or #if 4**4**4**4).

Before I wrote it, I first considered using the AST module as a substitute of eval() but that seems to also have issues (per the comment by Antti Haapala). Worth a look, in any case, about the dangers of eval and ast.

The implementation passes all tests in your suite except for one, tests.passthru.test10. The cause is obvious, but the rationale for expecting that result is unclear to me, so I haven't attempted to fix it yet. Should the suffix of numbers always be removed in passthrough mode? If so, why?

I've had to work around another problem, which is the incomplete tokenisation performed by PCPP of multi-symbol sequences, i.e. those grouped under the punctuator nonterminal in the standard, including digraphs. I would have reported it as an issue, as it is affecting other areas (such as token pasting, as mentioned in #19), but not being sure whether you desire to fix that, I hesitated. If you're willing to fix it, the evaluator code can be simpler. One major consequence of the incomplete tokenisation, compatibility-wise, is that the digraphs %:%: and %: (equivalent to ## and # respectively) are currently unsupported. I don't think my users will miss them, though, as the target language is not even C or C++.

In the README you mention that donations of a yacc-based evaluator are welcome. Well, mine is not yacc-based, but recursive-descendent; still, in case you want to incorporate it to PCPP, feel free to do so. The file is here: https://github.com/Sei-Lisa/LSL-PyOptimizer/blob/master/cpreproc.py; the relevant (non-)license starts in line 55. The fragments that are under the GPL are the ones specific to the optimizer: all methods in the Preproc class except evalexpr.

Sei-Lisa commented 5 years ago

Here's another problem of using Python expressions for evaluating:

#if -3 < -2 < -1
  "bad"
#else
  "good"
#endif

In mcpp and gcc, this outputs "good" (because -3 < -2 gives 1, and 1 < -1 is false); in pcpp it outputs "bad" (because Python's operator chaining feature causes that to be interpreted as (-3 < -2) and (-2 < -1) which is true).

ned14 commented 5 years ago

So you're donating a proper expression evaluator which is standards conforming? If so, amazing, thank you. But I'll need to kick reviewing and integrating it down the line a bit as it's a non trivial amount of work to do, and my plate is currently very full.

Is there any issue blocking you from further progress, or are you currently good and can wait a bit?

Sei-Lisa commented 5 years ago

Thank you! I'm good, it's usable as is so I can wait. I'd just like to understand the rationale behind tests.passthru.test10, which is the only one in your test suite that this evaluator fails, so I know whether to fix it or not. If it's just a side effect of using eval, and the test needed to be written like that in order to pass, then all is good, and the only necessary change is will be modifying the test.

Sei-Lisa commented 5 years ago

I forgot. There's a point that is still half-baked in the new evaluator, namely the treatment of Unicode char constants. I guess that the expected thing to do is to treat the input as UTF-8 bytes; that wouldn't be a problem for Python 2 but keeping dual compatibility might be.

Also, I haven't tried to keep compatibility with Python 3, mainly because I can't test. Not sure what version you're targeting, but with 3.5.3 PCPP isn't working for me.

ned14 commented 5 years ago

I'd just like to understand the rationale behind tests.passthru.test10, which is the only one in your test suite that this evaluator fails, so I know whether to fix it or not.

We strip the L for the Python hack evaluator. With a real evaluator, it is no longer necessary, nor desirable.

Thanks for the heads up regarding UTF-8 constants and Python 3 compatibility. Indeed some work remains on porting over this donated evaluator.

Sei-Lisa commented 5 years ago

Two new edits to the evaluator:

https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/a1b5f1bb458a8c9cb46ece02cdd33e1c85dc19f6 increases the black-boxing of the Evaluator class, to return a normal Python integer and not a wrapped integer. This is minor.

https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/4a9cc9e20f61071d7e5358f5468020819e4f3b3b adds support for comment passthrough, which I forgot about.

Both are in current master, so the link to the file in the OP still has the changes.

Sei-Lisa commented 5 years ago

New bugfix: https://github.com/Sei-Lisa/LSL-PyOptimizer/commit/138d042b2e52f896c5d79a18f20afc188fce937e

With that, the evaluator passes the Wave expressions suite (t_4_*.cpp) and the expressions-related tests in the mcpp general functionality suite as modified by Wave (t_5_*.cpp).

ned14 commented 5 years ago

Very cool. Looking forward to looking into it soon.

ned14 commented 3 years ago

Finally a yacc based expression evaluator is implemented. Only took me nearly two years to get it done.