Some include files are skipped after latest commit

trelau commented 3 years ago

After this commit ac7a2082f45d0455f3c480e2e0b6cb484c614614 I now see header files that are skipped in the output. I can't quite pin down the issue or reproduce it on a small test case, but thought worth posting in case something jumps out at anyone.

The screenshot below shows what is currently being output on the left compared to what was output before on the right:

For context, all I am trying to do is preprocess macros in a folder full of header files, and for each file output a new header file. That is, I'd prefer to leave everything untouched for a given file, just expand the macros. I'm writing this in Python where I create a custom preprocessor class, then for each header file I parse the file and write out a new one. I use re.compile('.') as the passthrough includes option because I want every header to just pass through and not be processed into the single file. This seemed to work before, but now is skipping some includes.

Perhaps a different issue (or user error) but I'll note it while I'm here: When processing a number of header files iteratively, the include guards get defined and then eventually the content in header files starts getting ignored because the include guard was defined by being included in previous files. So, in the on_potential_include_guard function I have to undefine the include guard header if this is the first file being processed. I added a custom variable in the preprocessor class to be set to True as the script iterates through each header file, then after undefining the include guard to make sure it processes the content in the header, it gets set to False for the current iteration.

For some more context, I'm experimenting using this project to process header files to eventually generate Python bindings for this project https://github.com/trelau/pyOCCT. I'm currently using clang and it seems like overkill, so I started trying https://github.com/robotpy/cxxheaderparser but it doesn't handle macros, which led me to this project to preprocess the header files to unpack the macros so the files can be parsed by cxxheaderparser.

ned14 commented 3 years ago

I assume you meant re.compile('.*') here right?

Separately though, if you could supply a self contained repro, that would be very useful. In the unit test I added for the previous commit I tested the situation of including multiple files, and it passed them through just fine.

If that's too much work for right now, if you could show me your python code for customising the Preprocessor class, that would be useful.

trelau commented 3 years ago

ah, yes. edited. I'll try and get you a reproducible example. until then, if helpful here is the code for the custom preprocessor. Most of it it just copy and pasted from your own preprocessor.

from pcpp import Preprocessor, OutputDirective, Action

class CustomPreprocessor(Preprocessor):

    def __init__(self):
        super().__init__()

        self.passthru_unfound_includes = True
        self.undefines = []
        self.passthru_undefined_exprs = False
        self.nevers = []
        self.passthru_defines = False
        self.passthru_comments = True

        self.bypass_ifpassthru = False
        self.potential_include_guard = None

        # Custom variable to flag when processing a new header file in the case that
        # multiple header files are being processed with a single preprocessor instance
        self.first_file = True

        # Some extra debugging variables to see what gets processed
        self.handled_directives = set()
        self.unknown_directives = dict()
        self.expanded_macros = set()
        self.unknown_macros = set()

    def on_include_not_found(self, is_malformed, is_system_include, curdir, includepath):
        if self.passthru_unfound_includes:
            raise OutputDirective(Action.IgnoreAndPassThrough)
        return super(CustomPreprocessor, self).on_include_not_found(is_malformed, is_system_include,
                                                                    curdir, includepath)

    def on_unknown_macro_in_defined_expr(self, tok):
        self.unknown_macros.add(tok.value)

        if self.undefines:
            if tok.value in self.undefines:
                return False
        if self.passthru_undefined_exprs:
            return None  # Pass through as expanded as possible
        return super(CustomPreprocessor, self).on_unknown_macro_in_defined_expr(tok)

    def on_unknown_macro_in_expr(self, ident):
        self.unknown_macros.add(ident)
        if self.undefines:
            if ident in self.undefines:
                return super(CustomPreprocessor, self).on_unknown_macro_in_expr(ident)
        if self.passthru_undefined_exprs:
            return None  # Pass through as expanded as possible
        return super(CustomPreprocessor, self).on_unknown_macro_in_expr(ident)

    def on_unknown_macro_function_in_expr(self, ident):
        self.unknown_macros.add(ident)

        if self.undefines:
            if ident in self.undefines:
                return super(CustomPreprocessor, self).on_unknown_macro_function_in_expr(ident)
        if self.passthru_undefined_exprs:
            return None  # Pass through as expanded as possible
        return super(CustomPreprocessor, self).on_unknown_macro_function_in_expr(ident)

    def on_directive_handle(self, directive, toks, ifpassthru, precedingtoks):
        if ifpassthru:
            if directive.value == 'if' or directive.value == 'elif' or directive == 'else' or directive.value == 'endif':
                self.bypass_ifpassthru = len([tok for tok in toks if
                                              tok.value == '__PCPP_ALWAYS_FALSE__' or tok.value == '__PCPP_ALWAYS_TRUE__']) > 0
            if not self.bypass_ifpassthru and (
                    directive.value == 'define' or directive.value == 'undef'):
                if toks[0].value != self.potential_include_guard:
                    raise OutputDirective(
                        Action.IgnoreAndPassThrough)  # Don't execute anything with effects when inside an #if expr with undefined macro
        if (directive.value == 'define' or directive.value == 'undef') and self.nevers:
            if toks[0].value in self.nevers:
                raise OutputDirective(Action.IgnoreAndPassThrough)
        if self.passthru_defines:
            super(CustomPreprocessor, self).on_directive_handle(directive, toks, ifpassthru,
                                                                precedingtoks)
            return None  # Pass through where possible
        return super(CustomPreprocessor, self).on_directive_handle(directive, toks, ifpassthru,
                                                                   precedingtoks)

    def on_directive_unknown(self, directive, toks, ifpassthru, precedingtoks):
        if ifpassthru:
            return None  # Pass through
        return super(CustomPreprocessor, self).on_directive_unknown(directive, toks, ifpassthru,
                                                                    precedingtoks)

    def on_potential_include_guard(self, macro):
        self.potential_include_guard = macro
        # If preprocessing a new file and an include guard is found, undefine it so the file
        # is processed. Sometimes they are skipped if their include guard was already encountered
        # and defined.
        if macro:
            if self.first_file:
                self.first_file = False
                self.undef(macro)
                print('Undefining {}'.format(macro))
        return super(CustomPreprocessor, self).on_potential_include_guard(macro)

    def on_comment(self, tok):
        if self.passthru_comments:
            return True  # Pass through
        return super(CustomPreprocessor, self).on_comment(tok)

Here is how it's setup:

def get_processor():
    # Define the preprocessor
    p = CustomPreprocessor()
    p.add_path('C:/Miniconda/envs/occt750/Library/include/opencascade')
    p.line_directive = None
    p.passthru_includes = re.compile('.*')

    # OCCT definitions
    p.define('UNICDOE 1')
    p.define('_UNICDOE 1')
    p.define('_CRT_SECURE_NO_WARNINGS 1')
    p.define('_CRT_NONSTDC_NO_DEPRECATE 1')
    p.define('HAVE_VTK 1')
    p.define('HAVE_FREEIMAGE 1')
    p.define('HAVE_TBB 1')
    p.define('__TBB_NO_IMPLICIT_LINKAGE 1')
    p.define('__TBBMALLOC_NO_IMPLICIT_LINKAGE 1')
    p.define('HAVE_RAPIDJSON 1')
    p.define('VTK_OPENGL2_BACKEND 1')
    p.define('No_Exception 1')
    p.define('VTK_USE_64BIT_IDS 1')

    # Platform and compiler definitions
    p.define('_WIN32 1')
    p.define('_WIN64 1')
    p.define('__WIN32__ 1')
    p.define('_MSC_VER 1900')
    p.define('__cplusplus 201402L')

    return p

And this is the function for looping over the headers and processing them:

def preprocess():
    p = get_processor()

    # Process each .hxx file
    occt_header_dir = 'C:/Miniconda/envs/occt750/Library/include/opencascade'
    output_dir = './_preprocessed'
    # if os.path.isdir(output_dir):
    #     os.remove(output_dir)
    # os.mkdir(output_dir)

    print('Preprocessing header files -----------------------------------------')
    files = os.listdir(occt_header_dir)
    nfiles = len(files)
    print('Found {} files to process.'.format(nfiles))
    i = 0
    for fname in files:
        if fname.endswith('.hxx'):
            p.first_file = True
            # Update stdout
            i += 1
            percent = round((i / nfiles) * 100)
            sys.stdout.write('\r{}% {}'.format(percent, fname))
            sys.stdout.flush()

            fin = open('/'.join([occt_header_dir, fname]), 'r')
            p.parse(fin)
            fin.close()

            fout = open('/'.join([output_dir, fname]), 'w')
            p.write(fout)
            fout.close()

            break

    print('\nPreprocessing complete ---------------------------------------------')

    # Write unknowns
    fout = open('unknown_macros.txt', 'w')
    for x in p.unknown_macros:
        fout.write('{}\n'.format(x))
    fout.close()

Just an initial proof-of-concept, so a lot of hacky stuff in there.

trelau commented 3 years ago

I should add, there are something like 8k+ header files to be processed. After it gets through about 50 of them, header files seem to be skipped entirely in output files.

ned14 commented 3 years ago

Give that commit there a try.

trelau commented 3 years ago

@ned14 that seem to have done the trick. though i still use the code above (the CustomPreprocessor) and I have to undefine include guards as I loop through each file, But, I am able to do that by overriding the method so it's all good. Thanks!

ned14 commented 3 years ago

Glad to have fixed the problem. Thanks for the BR!

ned14 / pcpp

Some include files are skipped after latest commit #59