standardese / cppast

Library to parse and work with the C++ AST
Other
1.69k stars 163 forks source link

ba6fd12 breaks preprocessor macros (tests failing) #153

Closed waitingtocompile closed 1 year ago

waitingtocompile commented 1 year ago

Clang 15 has brough two important changes that break things inside cppast.

The more serious of the two is the changes to preprocessing - macro definitions no longer seem to be detected correctly. The preprocessor line number test is also failing, although I am having a harder time determining why.

Less serious is that clang 15 has given concepts their own cursor, which is currently unhandled. This should be a fairly easy fix, simply plugging the existing concept parser into that cursor type. I'd be happy to put in a PR with a fix for this, though I'd appreciate guidance on how to ensure backwards compatability with clang versions that don't define that cursor type.

waitingtocompile commented 1 year ago

Concerning update - after rolling back my version of libclang to 14.0.6, the macro definitions still appear to be non-functioning. Running the source sample for the "cpp_macro_defintion" test case through cppast_example_ast_printer only prints

[simple file parser] [info] parsing file 'D:/Projects/cppast/build/example/Debug/cpp_macro_definition.cpp'
'D:/Projects/cppast/build/example/Debug/cpp_macro_definition.cpp' - file
        'iostream' - include directive
waitingtocompile commented 1 year ago

Okay, I've walked back through commits in main, ba6fd12dce395638a04113c2771eed84fc0a8250 (detatched-head at 282f05ed383cda42e844d8b4b510cf7e80e25721 passes tests) appears to be responsible for the breakage in macros. If I have time I'll dig into it and see if I can find a way to restore macro functionality while preserving the circular inclusion handling.

I'm also splitting the new concept cursor into it's own issue (#155) and re-naming this one to reflect the actual cause of the problem

foonathan commented 1 year ago

Hm, all tests on main pass on my machine (clang 14.0.6). What exactly is failing?

waitingtocompile commented 1 year ago

Here's the output of the tests as run on windows (building with MSVC), with clang 14.0.6

Inspecting in a debugger, no preprocessor macros seem to actually end up in cppast's syntax tree to be visited in these test cases, hence the issues.


cppast_test.exe is a Catch v2.13.9 host application.
Run with -? for options

-------------------------------------------------------------------------------
cpp_macro_definition
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(10)
...............................................................................

D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(84): FAILED:
  REQUIRE( count == 6u )
with expansion:
  0 == 6

-------------------------------------------------------------------------------
cpp_include_directive
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(122)
...............................................................................

D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(164): FAILED:
  REQUIRE( count == 2u )
with expansion:
  1 == 2

[simple file parser] [info] parsing file 'a.cpp'
[simple file parser] [info] parsing file 'b.cpp'
[simple file parser] [info] parsing file 'c.cpp'
-------------------------------------------------------------------------------
preprocessor line numbers
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\preprocessor.cpp(108)
...............................................................................

D:\Projects\cppast-testground\cppast-1\test\preprocessor.cpp(165): FAILED:
  REQUIRE( (file->unmatched_comments().size() == 6u + 1u) )
with expansion:
  false

===============================================================================
test cases:   44 |   41 passed | 3 failed
assertions: 2334 | 2331 passed | 3 failed
foonathan commented 1 year ago

Okay, I don't have that failure.

This might have something to do with Windows style line endings...?

waitingtocompile commented 1 year ago

Doing a force-conversion to *nix line endings doesn't appear to resolve the issue. I'll try to dig into it in next week if I have time, though truth be told I don't have a great understanding of how a lot of the parser works.

foonathan commented 1 year ago

Okay, so the goal of preprocessing is to preprocess the file but keep only the parts that came from the original file, and don't include contents of headers included by it. This is done as follows:

  1. Invoke the provided clang binary to preprocess the input with -E while keeping macro definitions and comments. This is done by https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L464. There are some optimizations ("fast preprocessing") like disabling all input search paths, so clang doesn't pull in that many files to start with, but I don't think the bug is in this part of the code.
  2. The result of clang_preprocess is the preprocessed file containing macro definitions (#define FOO) and line markers that indicate where stuff is coming from. They are documented here: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html The remaining code just parses that output to reconstruct information from there: https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L1104-L1246
  3. Parsing code is mostly straightforward: it iterates over the input, handles lines starting with # specially and copies them otherwise. That is all handled by position: https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L504-L644 It mostly keeps track of whether the input comes from the main file, in which case we care about it, or whether we want to skip past it as it comes from an included file.

As a first step, I'd verify that the result of clang_preprocess looks correct (it should show all file contents with linemarkers and maybe included files depending on fast preprocessing). That is hopefully the case.

waitingtocompile commented 1 year ago

I beleive I have found the culprit, and it lies in the behaviour around #include directives. During the include directive, the position object is write disabled. However, after reaching the end of the inclusion, it doesn't become write enabled again, and so any later preprocessor directives are discarded by the parser. This seems to be caused by a mismatch in the number of enter_new and enter_old line markers when processing an include directive (or at least some include directives). I'm still teasing out exactly why there's a mismatch, but I'll keep you posted on my progress.

waitingtocompile commented 1 year ago

I've identified the problematic headers, although identifying why they behave the way they do is beyond me.

When using the MSVCstdlib, the xstring header includes xmemory, which in turn includes limits. However, at the end of parsing limits, when stepping out the parser steps straight out to xstring, skipping over xmemory completely. Why it does this I have no idea, but it does present two opportunities for a fix

1 - squish in a dirty hack for these specific files. It's an inelegant solution, but should work. 2 - Swap out the current counting structure for write enable/disable for a solution that records which files are being stepped into and out of, and facilitate stepping over a file if we're moving to something further down the stack. This will make the position object substantially heavier and probably slow parsing down to fix an issue on a single platform with a single standard library implementation. What are your feelings on the matter?

foonathan commented 1 year ago

Can you post a (simplified) version of xstring, xmemory, and limits that reproduce the error? I can then take another stab at debugging them.

waitingtocompile commented 1 year ago

Unfortunately I haven't the faintest idea what element of them is causing the breakage. At a glance, these just appear to be normal header includes gated behind a preprocessor if and normal include guards, folowed by a few thousand lines of standard library code. I don't know how to go about paring them down to a simplified form that reproduces the error.

Directly including xstring, interesting, causes the error to shift, with the preprocessor attempting to step out to other files that weren't directly included earlier. Whatever is going on is way beyond my understanding at this point.

waitingtocompile commented 1 year ago

One obvious solution I've just realised that might be workable - rather than messing about counting the number of enter old/new linemarkers, what if we instead simply disable write when we encounter the first enter new linemarker, and then only re-enable it on hitting an enter old linemarker that specifically re-enters the actual file we're parsing?

I just realised this is the old solution you got rid of for the new counting approach, never mind.

foonathan commented 1 year ago

I think I have an idea what might cause it: parsing looks for \, ", ', # or /.

Does xstring contain some string literals or something which in turn contain #, which could then be incorrectly interpreted as line markers?

waitingtocompile commented 1 year ago

No, all instances of the # character in xstring are actual preprocessor directives, mostly if and endif. The only other relevant detail I can think of is that the xmemory include is inside a conditional based on the value of a macro

foonathan commented 1 year ago

Can you try removing everything but preprocessor directives and see if the error happens anyway?

waitingtocompile commented 1 year ago

Removing everything except preprocessor directives from xstring does not cause any change to behaviour. Doing the same to xmemory causes the enter old line marker to skip more files, skipping two layers of files rather than one. Making the change to both xstring and xmemory causes it to skip three layers.

Removing material from limits causes a rather more complex change in the error. This may be completely unrelated, but limits is also the only file of the three to contain multi-line preprocessor directives, such as

#if (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \
    && !defined(__INTEL_COMPILER) && !defined(__clang__) // TRANSITION, LLVM-51488

When making the change to limits, the old issue is resolved, but re-inclusions of already included headers that eventually cascade down to limits do cause the file-skipping behaviour, with the enter old line marker going straight from the already included file all the way up to the originating file, skipping out any files in between. However it doing so doesn't cause any actual issues, the tests pass happily and all macro definitions are properly recorded and exposed. This may not be the case with a more complex inclusion graph, but it's not something I've yet tested.

I'm stopping here for the day, I still really have no clue why this particular file is provoking such odd behaviour.

foonathan commented 1 year ago

Can you give me a source file that includes some MSVC header and reproduces some sort of error? Otherwise, I can't really help you on my end.

waitingtocompile commented 1 year ago

The preprocessor macro test produces that error as-is, when clang is using the MSVC STL

foonathan commented 1 year ago

Okay, I have someting. Thanks.

foonathan commented 1 year ago

... it's integer literals with digit separators

foonathan commented 1 year ago

Can you test https://github.com/foonathan/cppast/tree/feature/preprocessor?

waitingtocompile commented 1 year ago

Yes, it works fine, many thanks