ned14 / pcpp

A C99 preprocessor written in pure Python
Other
222 stars 41 forks source link

A C99 preprocessor written in pure Python

.. role:: c(code) :language: c

.. |travis| image:: https://github.com/ned14/pcpp/workflows/CI/badge.svg?branch=master :align: middle :target: https://github.com/ned14/pcpp/actions

(C) 2018-2021 Niall Douglas http://www.nedproductions.biz/ and (C) 2007-2020 David Beazley http://www.dabeaz.com/

PyPI: https://pypi.python.org/pypi/pcpp Github: https://github.com/ned14/pcpp API reference docs: https://ned14.github.io/pcpp/

Travis master branch all tests passing for Python v2, v3 and PyPy v2, v3: |travis|

A pure universal Python C (pre-)preprocessor implementation very useful for pre-preprocessing header only C++ libraries into single file includes and other such build or packaging stage malarky. The implementation can be used as a Python module (see API reference <https://ned14.github.io/pcpp/>_) or as a command line tool pcpp which can stand in for a conventional C preprocessor (i.e. it'll accept similar arguments). Works great under PyPy, and you can expect performance gains of between 0.84x and 2.62x (average = 2.2x, median = 2.31x).

Your includes can be benchmarked for heft in order to improve your build times! See the --times and --filetimes options, and you can see graphs from pcpp for the C++ STLs at https://github.com/ned14/stl-header-heft.

A very unique facility of this C preprocessor is partial preprocessing so you can programmatically control how much preprocessing is done by pcpp and how much is done by the C or C++ compiler's preprocessor. The ultimate control is by subclassing the :c:Preprocessor class in Python from which you can do anything you like, however for your convenience the pcpp command line tool comes with the following canned partial preprocessing algorithms:

passthru-defines Pass through but still execute #defines and #undefs if not always removed by preprocessor logic. This ensures that including the output sets exactly the same macros as if you included the original, plus include guards work.

passthru-unfound-includes If an :c:#include is not found, pass it through unmodified. This is very useful for passing through includes of system headers.

passthru-undefined-exprs This is one of the most powerful pass through algorithms. If an expression passed to :c:#if (or its brethern) contains an unknown macro, expand the expression with known macros and pass through unexecuted, and then pass through the remaining block. Each :c:#elif is evaluated in turn and if it does not contain unknown macros, it will be executed immediately. Finally, any :c:#else clause is always passed through unexecuted. Note that include guards normally defeat this algorithm, so those are specially detected and ignored.

passthru-comments A major use case for pcpp is as a preprocessor for the doxygen <http://www.stack.nl/~dimitri/doxygen/>_ reference documentation tool whose preprocessor is unable to handle any preprocessing of any complexity. pcpp can partially execute the preprocessing which doxygen is incapable of, thus generating output which produces good results with doxygen. Hence the ability to pass through comments containing doxygen markup is very useful.

passthru-magic-macros Don't expand __DATE__, __TIME__, __FILE__, __LINE__ nor __COUNTER__.

passthru-includes Don't expand those #include whose arguments match the supplied regular expression into the output, however still execute those includes. This lets you generate output with macros from nested includes expanded, however those #include matching the regular expression are passed through into the output.

Standards (non-)compliance

pcpp passes a very slightly modified edition of the mcpp <http://mcpp.sourceforge.net/>_ unit test suite. The only modifications done were to disable the digraph and trigraphs tests. It also passes the list of "preprocessor torture" expansion fragments in the C11 standard, correctly expanding some very complex recursive macro expansions where expansions cause new macro expansions to be formed. In this, it handily beats the MSVC preprocessor and ought to handle most C99 preprocessor metaprogramming. If you compare its output side-by-side to that of GCC or clang's preprocessor, results are extremely close indeed with blank line collapsing being the only difference.

As of v1.30 (Oct 2020), a proper yacc based expression evaluator for :c:#if expressions is used which is standards conforming, and fixes a large number of problems found in the previous Python :c:eval() based expression evaluator.

A full, detailed list of known non-conformance with the C99 standard is below. Pull requests with bug fixes and new unit tests for the fix are welcome.

If you are on Python 2, files are parsed as strings, and unicode is not supported. On Python 3, input and output files can have your choice of encoding, and you can hook file open to inspect the encoding using chardet.

Note that most of this preprocessor was written originally by David Beazley to show off his excellent Python Lex-Yacc library PLY (http://www.dabeaz.com/ply/) and is hidden in there without being at all obvious given the number of Stack Overflow questions which have asked for a pure Python C preprocessor implementation. This implementation fixes a lot of conformance bugs (the original was never intended to rigidly adhere to the C standard) and adds in a test suite based on the C11 preprocessor torture samples plus the mcpp preprocessor test suite. Still, this project would not be possible without David's work, so please take off your hat and give a bow towards him.

Command line tool pcpp:

The help from the command line tool pcpp::

usage: pcpp [-h] [-o [path]] [-D macro[=val]] [-U macro] [-N macro] [-I path]
            [--passthru-defines] [--passthru-unfound-includes]
            [--passthru-unknown-exprs] [--passthru-comments]
            [--passthru-magic-macros] [--passthru-includes <regex>]
            [--disable-auto-pragma-once] [--line-directive [form]] [--debug]
            [--time] [--filetimes [path]] [--compress]
            [--assume-input-encoding <encoding>]
            [--output-encoding <encoding>] [--write-bom] [--version]
            [input [input ...]]

A pure universal Python C (pre-)preprocessor implementation very useful for
pre-preprocessing header only C++ libraries into single file includes and
other such build or packaging stage malarky.

positional arguments:
  input                 Files to preprocess (use '-' for stdin)

optional arguments:
  -h, --help            show this help message and exit
  -o [path]             Output to a file instead of stdout
  -D macro[=val]        Predefine name as a macro [with value]
  -U macro              Pre-undefine name as a macro
  -N macro              Never define name as a macro, even if defined during
                        the preprocessing.
  -I path               Path to search for unfound #include's
  --passthru-defines    Pass through but still execute #defines and #undefs if
                        not always removed by preprocessor logic
  --passthru-unfound-includes
                        Pass through #includes not found without execution
  --passthru-unknown-exprs
                        Unknown macros in expressions cause preprocessor logic
                        to be passed through instead of executed by treating
                        unknown macros as 0L
  --passthru-comments   Pass through comments unmodified
  --passthru-magic-macros
                        Pass through double underscore magic macros unmodified
  --passthru-includes <regex>
                        Regular expression for which #includes to not expand.
                        #includes, if found, are always executed
  --disable-auto-pragma-once
                        Disable the heuristics which auto apply #pragma once
                        to #include files wholly wrapped in an obvious include
                        guard macro
  --line-directive [form]
                        Form of line directive to use, defaults to #line,
                        specify nothing to disable output of line directives
  --debug               Generate a pcpp_debug.log file logging execution
  --time                Print the time it took to #include each file
  --filetimes [path]    Write CSV file with time spent inside each included
                        file, inclusive and exclusive
  --compress            Make output as small as possible
  --assume-input-encoding <encoding>
                        The text encoding to assume inputs are in
  --output-encoding <encoding>
                        The text encoding to use when writing files
  --write-bom           Prefix any output with a Unicode BOM
  --version             show program's version number and exit

Note that so pcpp can stand in for other preprocessor tooling, it ignores any
arguments it does not understand.

Quick demo of pass through mode

Let us look at an example for pass through mode. Here is the original:

.. code-block:: c

    #if !defined(__cpp_constexpr)
    #if __cplusplus >= 201402L
    #define __cpp_constexpr 201304  // relaxed constexpr
    #else
    #define __cpp_constexpr 190000
    #endif
    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #if __cpp_constexpr >= 201304
    #define BOOSTLITE_CONSTEXPR constexpr
    #endif
    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #define BOOSTLITE_CONSTEXPR
    #endif

``pcpp test.h --passthru-defines --passthru-unknown-exprs`` will output:

.. code-block:: c

    #if !defined(__cpp_constexpr)
    #if __cplusplus >= 201402
    #define __cpp_constexpr 201304
    #else
    #define __cpp_constexpr 190000
    #endif
    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #if __cpp_constexpr >= 201304
    #define BOOSTLITE_CONSTEXPR constexpr
    #endif
    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #define BOOSTLITE_CONSTEXPR
    #endif

This is because ``__cpp_constexpr`` was not defined, so because of the ``--passthru-unknown-exprs`` flag
we pass through everything inside that if block **unexecuted** i.e. defines and undefs are NOT executed by
``pcpp``. Let's define ``__cpp_constexpr``:

``pcpp test.h --passthru-defines --passthru-unknown-exprs -D __cpp_constexpr``

.. code-block:: c

    #line 8 "test.h"
    #ifndef BOOSTLITE_CONSTEXPR

    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #define BOOSTLITE_CONSTEXPR
    #endif

So, big difference now. We execute the entire first if block as ``__cpp_constexpr`` is now defined, thus
leaving whitespace. Let's try setting ``__cpp_constexpr`` a bit higher:

``pcpp test.h --passthru-defines --passthru-unknown-exprs -D __cpp_constexpr=201304``

.. code-block:: c

    #line 8 "test.h"
    #ifndef BOOSTLITE_CONSTEXPR

    #define BOOSTLITE_CONSTEXPR constexpr

    #endif

As you can see, the lines related to the known ``__cpp_constexpr`` are executed and removed, passing through
any if blocks with unknown macros in the expression.

What if you want a macro to be known but undefined? The -U (to undefine) flag has an obvious meaning in pass
through mode in that it makes a macro no longer unknown, but known to be undefined.

``pcpp test.h --passthru-defines --passthru-unknown-exprs -U __cpp_constexpr``

.. code-block:: c

    #if __cplusplus >= 201402
    #define __cpp_constexpr 201304
    #else
    #define __cpp_constexpr 190000
    #endif

    #ifndef BOOSTLITE_CONSTEXPR

    #endif
    #ifndef BOOSTLITE_CONSTEXPR
    #define BOOSTLITE_CONSTEXPR
    #endif

Here ``__cpp_constexpr`` is known to be undefined so the first clause executes, but ``__cplusplus`` is
unknown so that entire block is passed through unexecuted. In the next test comparing ``__cpp_constexpr``
to 201304 it is still known to be undefined, and so 0 >= 201304 is the expressions tested which is false,
hence the following stanza is removed entirely.

Helping ``pcpp`` using source code annotation

You can achieve a great deal using -D (define), -U (undefine) and -N (never define) on the command line, but for more complex preprocessing it gets hard to pass through the correct logic without some source code annotation.

pcpp lets you annotate which part of an if block being passed through due to use of unknown macros to also be executed in addition to the pass through. For this use __PCPP_ALWAYS_FALSE__ or __PCPP_ALWAYS_TRUE__ which tells pcpp to temporarily start executing the passed through preprocessor commands e.g.

.. code-block:: c

#if !defined(__cpp_constexpr)
#if __cplusplus >= 201402L
#define __cpp_constexpr 201304
#elif !__PCPP_ALWAYS_FALSE__     // pcpp please execute this next block
#define __cpp_constexpr 190000
#endif
#endif
#ifndef BOOSTLITE_CONSTEXPR
#if __cpp_constexpr >= 201304
#define BOOSTLITE_CONSTEXPR constexpr
#endif
#endif
#ifndef BOOSTLITE_CONSTEXPR
#define BOOSTLITE_CONSTEXPR
#endif

Note that __PCPP_ALWAYS_FALSE__ will always be false in any other preprocessor, and it is also false in pcpp. However, it causes pcpp to execute the define of __cpp_constexpr to 190000:

pcpp test.h --passthru-defines --passthru-unknown-exprs

.. code-block:: c

#if !defined(__cpp_constexpr)
#if __cplusplus >= 201402
#define __cpp_constexpr 201304
#elif 1
#define __cpp_constexpr 190000
#endif
#endif
#ifndef BOOSTLITE_CONSTEXPR

#endif
#ifndef BOOSTLITE_CONSTEXPR
#define BOOSTLITE_CONSTEXPR
#endif

This is one way of marking up #else clauses so they always execute in a normal preprocessor and also pass through with execution with pcpp. You can, of course, also place || __PCPP_ALWAYS_FALSE__ in any #if stanza to cause it to be passed through with execution, but not affect the preprocessing logic otherwise.

What's implemented by the Preprocessor class:

Additionally implemented by pcpp command line tool:

Not implemented yet (donations of code welcome):

Known bugs (ordered from worst to least worst):

None presently known.

Customising your own preprocessor:

See the API reference docs at https://ned14.github.io/pcpp/

You can find an example of overriding the on_*() processing hooks at https://github.com/ned14/pcpp/blob/master/pcpp/pcmd.py

History:

v1.30 (29th October 2021):

v1.22 (19th October 2020):

v1.21 (30th September 2019):

v1.20 (7th January 2019):

v1.1 (19th June 2018):

v1.01 (21st Feb 2018):

v1.00 (13th Mar 2017):

First release