Closed waitingtocompile closed 1 year ago
Concerning update - after rolling back my version of libclang to 14.0.6, the macro definitions still appear to be non-functioning. Running the source sample for the "cpp_macro_defintion" test case through cppast_example_ast_printer only prints
[simple file parser] [info] parsing file 'D:/Projects/cppast/build/example/Debug/cpp_macro_definition.cpp'
'D:/Projects/cppast/build/example/Debug/cpp_macro_definition.cpp' - file
'iostream' - include directive
Okay, I've walked back through commits in main, ba6fd12dce395638a04113c2771eed84fc0a8250 (detatched-head at 282f05ed383cda42e844d8b4b510cf7e80e25721 passes tests) appears to be responsible for the breakage in macros. If I have time I'll dig into it and see if I can find a way to restore macro functionality while preserving the circular inclusion handling.
I'm also splitting the new concept cursor into it's own issue (#155) and re-naming this one to reflect the actual cause of the problem
Hm, all tests on main
pass on my machine (clang 14.0.6). What exactly is failing?
Here's the output of the tests as run on windows (building with MSVC), with clang 14.0.6
Inspecting in a debugger, no preprocessor macros seem to actually end up in cppast's syntax tree to be visited in these test cases, hence the issues.
cppast_test.exe is a Catch v2.13.9 host application.
Run with -? for options
-------------------------------------------------------------------------------
cpp_macro_definition
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(10)
...............................................................................
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(84): FAILED:
REQUIRE( count == 6u )
with expansion:
0 == 6
-------------------------------------------------------------------------------
cpp_include_directive
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(122)
...............................................................................
D:\Projects\cppast-testground\cppast-1\test\cpp_preprocessor.cpp(164): FAILED:
REQUIRE( count == 2u )
with expansion:
1 == 2
[simple file parser] [info] parsing file 'a.cpp'
[simple file parser] [info] parsing file 'b.cpp'
[simple file parser] [info] parsing file 'c.cpp'
-------------------------------------------------------------------------------
preprocessor line numbers
-------------------------------------------------------------------------------
D:\Projects\cppast-testground\cppast-1\test\preprocessor.cpp(108)
...............................................................................
D:\Projects\cppast-testground\cppast-1\test\preprocessor.cpp(165): FAILED:
REQUIRE( (file->unmatched_comments().size() == 6u + 1u) )
with expansion:
false
===============================================================================
test cases: 44 | 41 passed | 3 failed
assertions: 2334 | 2331 passed | 3 failed
Okay, I don't have that failure.
This might have something to do with Windows style line endings...?
Doing a force-conversion to *nix line endings doesn't appear to resolve the issue. I'll try to dig into it in next week if I have time, though truth be told I don't have a great understanding of how a lot of the parser works.
Okay, so the goal of preprocessing is to preprocess the file but keep only the parts that came from the original file, and don't include contents of headers included by it. This is done as follows:
-E
while keeping macro definitions and comments. This is done by https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L464. There are some optimizations ("fast preprocessing") like disabling all input search paths, so clang doesn't pull in that many files to start with, but I don't think the bug is in this part of the code.clang_preprocess
is the preprocessed file containing macro definitions (#define FOO
) and line markers that indicate where stuff is coming from. They are documented here: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html The remaining code just parses that output to reconstruct information from there: https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L1104-L1246#
specially and copies them otherwise. That is all handled by position: https://github.com/foonathan/cppast/blob/e751294b96b3f404a977469f3dc5efc80d4a76e4/src/libclang/preprocessor.cpp#L504-L644 It mostly keeps track of whether the input comes from the main file, in which case we care about it, or whether we want to skip past it as it comes from an included file.As a first step, I'd verify that the result of clang_preprocess
looks correct (it should show all file contents with linemarkers and maybe included files depending on fast preprocessing). That is hopefully the case.
I beleive I have found the culprit, and it lies in the behaviour around #include
directives. During the include directive, the position
object is write disabled. However, after reaching the end of the inclusion, it doesn't become write enabled again, and so any later preprocessor directives are discarded by the parser. This seems to be caused by a mismatch in the number of enter_new
and enter_old
line markers when processing an include directive (or at least some include directives). I'm still teasing out exactly why there's a mismatch, but I'll keep you posted on my progress.
I've identified the problematic headers, although identifying why they behave the way they do is beyond me.
When using the MSVCstdlib, the xstring
header includes xmemory
, which in turn includes limits
. However, at the end of parsing limits
, when stepping out the parser steps straight out to xstring
, skipping over xmemory
completely. Why it does this I have no idea, but it does present two opportunities for a fix
1 - squish in a dirty hack for these specific files. It's an inelegant solution, but should work.
2 - Swap out the current counting structure for write enable/disable for a solution that records which files are being stepped into and out of, and facilitate stepping over a file if we're moving to something further down the stack. This will make the position
object substantially heavier and probably slow parsing down to fix an issue on a single platform with a single standard library implementation. What are your feelings on the matter?
Can you post a (simplified) version of xstring
, xmemory
, and limits
that reproduce the error? I can then take another stab at debugging them.
Unfortunately I haven't the faintest idea what element of them is causing the breakage. At a glance, these just appear to be normal header includes gated behind a preprocessor if and normal include guards, folowed by a few thousand lines of standard library code. I don't know how to go about paring them down to a simplified form that reproduces the error.
Directly including xstring
, interesting, causes the error to shift, with the preprocessor attempting to step out to other files that weren't directly included earlier. Whatever is going on is way beyond my understanding at this point.
One obvious solution I've just realised that might be workable - rather than messing about counting the number of enter old/new linemarkers, what if we instead simply disable write when we encounter the first enter new linemarker, and then only re-enable it on hitting an enter old linemarker that specifically re-enters the actual file we're parsing?
I just realised this is the old solution you got rid of for the new counting approach, never mind.
I think I have an idea what might cause it: parsing looks for \, ", ', # or /.
Does xstring contain some string literals or something which in turn contain #, which could then be incorrectly interpreted as line markers?
No, all instances of the # character in xstring are actual preprocessor directives, mostly if and endif. The only other relevant detail I can think of is that the xmemory include is inside a conditional based on the value of a macro
Can you try removing everything but preprocessor directives and see if the error happens anyway?
Removing everything except preprocessor directives from xstring does not cause any change to behaviour. Doing the same to xmemory causes the enter old line marker to skip more files, skipping two layers of files rather than one. Making the change to both xstring and xmemory causes it to skip three layers.
Removing material from limits causes a rather more complex change in the error. This may be completely unrelated, but limits is also the only file of the three to contain multi-line preprocessor directives, such as
#if (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) && !defined(__CUDACC__) \
&& !defined(__INTEL_COMPILER) && !defined(__clang__) // TRANSITION, LLVM-51488
When making the change to limits, the old issue is resolved, but re-inclusions of already included headers that eventually cascade down to limits do cause the file-skipping behaviour, with the enter old line marker going straight from the already included file all the way up to the originating file, skipping out any files in between. However it doing so doesn't cause any actual issues, the tests pass happily and all macro definitions are properly recorded and exposed. This may not be the case with a more complex inclusion graph, but it's not something I've yet tested.
I'm stopping here for the day, I still really have no clue why this particular file is provoking such odd behaviour.
Can you give me a source file that includes some MSVC header and reproduces some sort of error? Otherwise, I can't really help you on my end.
The preprocessor macro test produces that error as-is, when clang is using the MSVC STL
Okay, I have someting. Thanks.
... it's integer literals with digit separators
Yes, it works fine, many thanks
libclang_parser
Clang 15 has brough two important changes that break things inside cppast.
The more serious of the two is the changes to preprocessing - macro definitions no longer seem to be detected correctly. The preprocessor line number test is also failing, although I am having a harder time determining why.
Less serious is that clang 15 has given concepts their own cursor, which is currently unhandled. This should be a fairly easy fix, simply plugging the existing concept parser into that cursor type. I'd be happy to put in a PR with a fix for this, though I'd appreciate guidance on how to ensure backwards compatability with clang versions that don't define that cursor type.