ned14 / pcpp

A C99 preprocessor written in pure Python
Other
215 stars 39 forks source link

how to make pcpp ignore # followed by whitespace? #27

Closed Phillip-M-Feldman closed 5 years ago

Phillip-M-Feldman commented 5 years ago

I'd like to modify pcpp to make it ignore # followed by whitespace, e.g., so that pcpp will ignore something like # if or # error. I believe that this will require overriding parsegen, but that function is rather sparsely documented and I'm having difficulty reverse engineering it. I suspect that I need to modify the code immediately following the comment "# Skip over whitespace". Any suggestions re. how to make this change will be appreciated.

Phillip-M-Feldman commented 5 years ago

I implemented a solution in my local copy of the code. The change involves a small number of lines beginning at line 1120 in preprocessor.py. The updated code follows:

if tok.value == '#':

            try:

                # The following two lines prevent a pound sign immediately followed by
                # white space from being treated as a preprocessor directive:
                if len(x) >= i+2 and x[1+1].type == 'CPP_WS':
                   raise OutputDirective()

                # If we got here, the pound sign is to be treated as a
                # preprocessor directive.

                output_and_expand_line = False

This change allows the preprocessor to distinguish between Python comments and preprocessor directives, both of which begin with a pound sign. (I've verified that I'm able to run the preprocessor on Python code without getting spurious errors, and that the output is correct). I'd like to suggest that a command-line option be added to pcpp to enable this behavior. This should be a trivial change, and would significantly extend the applicability of pcpp.

ned14 commented 5 years ago

What is wrong with the intercept at https://ned14.github.io/pcpp/#pcpp.Preprocessor.on_directive_handle?

ned14 commented 5 years ago

Regarding the specific case of # followed by whitespace followed by a directive, I appreciate that strictly speaking you can't have spaces before C preprocessor directives as according to the standard. However it's a very, very common extension to allow people to indent their preprocessor directives.

What's your use case?

Phillip-M-Feldman commented 5 years ago

My use case is preprocessing of Python code. Because Python uses the pound sign to indicate a comma, while the preprocessor uses the pound sign to indicate a directive, there is an ambiguity that must somehow be resolved. As you noted, in C preprocessor directives, there technically should be no whitespace between the pound sign and the rest of the directive. Although the Python standard does not require whitespace after the pound sign, most Python programmers routinely do this. So, this seems like a reasonable basis for distinguishing Python comments from preprocessor directives. The current behavior should of course remain the default. I'm just proposing the addition of a command-line option that would enable the other behavior. (BTW: The code that I provided has been thoroughly tested).

On Wed, Feb 6, 2019 at 1:51 AM Niall Douglas notifications@github.com wrote:

Regarding the specific case of # followed by whitespace followed by a directive, I appreciate that strictly speaking you can't have spaces before C preprocessor directives as according to the standard. However it's a very, very common extension to allow people to indent their preprocessor directives.

What's your use case?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ned14/pcpp/issues/27#issuecomment-460962391, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBBxKgoEWQU6AQHjZu_UUbI6tACYXxSks5vKqWYgaJpZM4ajkWh .

ned14 commented 5 years ago

You, I am wondering why not hook https://ned14.github.io/pcpp/#pcpp.Preprocessor.on_directive_handle and https://ned14.github.io/pcpp/#pcpp.Preprocessor.on_directive_unknown and return None to cause the comment/preprocessor to be preserved exactly, and executed only if it is recognised.

In other words, isn't this implemented already?

Phillip-M-Feldman commented 5 years ago

Niall: on_directive_handle almost does the job, but isn't called for # error and thus wouldn't be able to intercept that.

On Fri, Feb 8, 2019 at 3:28 AM Niall Douglas notifications@github.com wrote:

You, I am wondering why not hook https://ned14.github.io/pcpp/#pcpp.Preprocessor.on_directive_handle and https://ned14.github.io/pcpp/#pcpp.Preprocessor.on_directive_unknown and return None to cause the comment/preprocessor to be preserved exactly, and executed only if it is recognised.

In other words, isn't this implemented already?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ned14/pcpp/issues/27#issuecomment-461773728, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBBxCx-1DbPcv4g1tqni4dy0THnoZgnks5vLV9EgaJpZM4ajkWh .

Phillip-M-Feldman commented 5 years ago

I still think that adding a command-line flag to enable the snippet of code that I submitted is a reasonable solution, enabling this functionality for those users who need it, with no impact to those users who don't.

ned14 commented 5 years ago

I'll come back to this issue in May. Sorry for the delay.

ned14 commented 5 years ago

Ok, I've improved the handling of directives so you can do what you want. Demo can be found at https://github.com/ned14/pcpp/blob/master/tests/issue0027.py

Phillip-M-Feldman commented 5 years ago

This is fantastic! Thanks!