Handling advanced pre-processor macros.

ciechowoj commented 8 years ago

I'm wondering how to handle 'advanced' pre-processor stuff. I have two problems. The first problem is libclang only parses an active branch of pre-processor condition. E.g.

#define FOO

#ifdef FOO
// This will appear in AST.
#else
// This won't appear in AST.
#endif

I suppose it isn't possible to implement translation of pre-processor that handles all possible cases, but most common cases should be doable. The only idea I have is to load the content of file we are translating and parse macros manually (possibly using clang to parse the code inside ifdefs) to some AST form, suitable for further processing. This should be perfectly doable. The next problem is a design/implementation one.

Currently there is a thin wrapper around c libclang API. I think it would be nice if pre-processor information (that we extract from file) are available through the same API. What I propose is to extend the classes (structs) like Cursor, SourceLocation, Token etc. in a way that they can contain data created by the user of the thin wrapper (currently an instances of that structs can only be created by libclang). Then, one could iterate through AST containing all the information in a transparent way, without worrying if the data comes from libclang or from the pre-processor parser. For example I would like Cursor to have a constructor that allows to create an instance of it from user-supplied data. E.g.

struct Cursor
{
    this(string spelling, CXCursorKind kind, SourceRange, ...);

    /+ ... +/
}

An alternative would be to add separate API and then interleave them in actual translator code, however it seems to be messy.

What do you think about it?

jacob-carlborg commented 8 years ago

The only idea I have is to load the content of file we are translating and parse macros manually (possibly using clang to parse the code inside ifdefs) to some AST form, suitable for further processing. This should be perfectly doable.

I would hope it's possible to using libclang to lex the file to get the tokens of the branch that is not compiled. Then run the tokens through libclang again to do the full parsing and analyze.

There's a problem regardless which method is used. You might get compile errors if you're trying to analyze the branch that is normally not analyzed. Example:

#if _WIN32
#include <windows.h>
void foo(DWORD param); // DWORD is declared in windows.h
#endif

Running the above translation on any other platform than windows will fail to compile if the branch is analyzed.

As far as the design. At some point you need to run libclang on the source code/tokens and the wrapper types can be created like today. Or am I missing something?

ciechowoj commented 8 years ago

I would hope it's possible to using libclang to lex the file to get the tokens of the branch that is not compiled. Then run the tokens through libclang again to do the full parsing and analyze.

The first thing seems possible, the second not so much.

There's a problem regardless which method is used. You might get compile errors if you're trying to analyze the branch that is normally not analyzed.

Good point. Hmm, I believe we could go quite far by explicitly handling such cases (e.g. using knowledge of what windows.h is).

As far as the design. At some point you need to run libclang on the source code/tokens and the wrapper types can be created like today. Or am I missing something?

Yes, this would be the case, if libclang is capable of it. But, the problem is that to parse that #ifdef-blocks, even in isolation, one would need to create a new TranslationUnit and currently Cursors and other stuff are bound to particular translation unit and intermixing Cursors from different TUs would be a bad idea, so wrapper types cannot be used in current shape.

It is still a problem if I wanted to parse those blocks by some other way (writing own parser to handle most common cases, I don't know currently).

After some thinking I believe at first I'll try to solve the problem without complicating that thin wrapper.

jacob-carlborg commented 8 years ago

It is still a problem if I wanted to parse those blocks by some other way (writing own parser to handle most common cases, I don't know currently).

This should be avoided.

After some thinking I believe at first I'll try to solve the problem without complicating that thin wrapper.

Sounds like a good idea.

ciechowoj commented 8 years ago

This should be avoided.

I suppose it will be hard to avoid. The only API that libclang provides is clang_getSkippedRanges, but it isn't enough : (. Ideally I would like to have a cursor (or equivalent) for every preprocessor directive...

jacob-carlborg commented 8 years ago

Why do you think intermixing Cursors from different TUs would be a bad idea?

jacob-carlborg commented 8 years ago

I suppose it will be hard to avoid. The only API that libclang provides is clang_getSkippedRanges, but it isn't enough : (. Ideally I would like to have a cursor (or equivalent) for every preprocessor directive...

I'm not sure if you would like to go the route but there's always the possibility to contribute to libclang by extending the API if we need to.

I found this [1] which looks interesting. Unfortunately it looks like it never got any attention.

[1] http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120326/055612.html

ciechowoj commented 8 years ago

Why do you think intermixing Cursors from different TUs would be a bad idea?

An example: Let a be from main translation unit and represents a struct A; and then let b be from another translation unit (e.g. ifdef-block, the same file) and represents another struct A;. Then I want to check if the struct they refer too is the same

Cursor a = ...;
Cursor b = ...;

if (a.canonical == b.canonical) // clang_getCanonicalType
{
    ...
}

I suppose it won't work as libclang doesn't see a connection between the cursors. There are plenty of similar problems.

jacob-carlborg commented 8 years ago

Hmm, I see the problem.

ciechowoj commented 8 years ago

I'm not sure if you would like to go the route but there's always the possibility to contribute to libclang by extending the API if we need to.

I considered it, but the road to contribute something to llvm is quite steep (not so easy as making fork and pull req on github : ( ).

I found this [1] which looks interesting. Unfortunately it looks like it never got any attention.

I've already implemented it in the most recent PR (by parsing tokenized directive...).

The libclang has quite good API to access #defines. The real problem are conditional directives.

jacob-carlborg commented 8 years ago

I considered it, but the road to contribute something to llvm is quite steep (not so easy as making fork and pull req on github)

They're considering moving the project to git 😃. But yeah, that's a good point. Perhaps there's a good reason why they don't provide access to that.

If one would have access to both branches of a preprocessor if statement would might also get errors like duplicated symbols.

ciechowoj commented 8 years ago

They're considering moving the project to git :smiley:.

Nice to hear that.

If one would have access to both branches of a preprocessor if statement would might also get errors like duplicated symbols.

Right.

jacob-carlborg / dstep

Handling advanced pre-processor macros. #65