ltcmelo / psychec

A compiler frontend for the C programming language
BSD 3-Clause "New" or "Revised" License
536 stars 39 forks source link

Handling `#define` in the AST? #45

Closed geekrelief closed 3 years ago

geekrelief commented 3 years ago

Is dealing with macros, i.e. #define, out of the scope of --C-dump-AST? I guess technically macros are preprocessed, and they shouldn't be in the AST (clang discards them).

Handling them would be really useful for my use case parsing The Machinery headers. They make use of a lot of macros to smooth over the API and translating those to another language is one of the biggest failure points for binding generators.

For example with tree-sitter, its C parser can detect if a node is some kind of preprocessor symbol, so it's possible to translate a #define to a constant or a call expression .e.g.:

#define TM_VERSION(major, minor, patch) (TM_LITERAL(tm_version_t){ major, minor, patch })
#define tm_api_registry_api_version TM_VERSION(0, 3, 0)
#define tm_get_api(reg, TYPE) \
    (struct TYPE *)reg->get(#TYPE, TYPE##_version)
aytey commented 3 years ago

There's quite a big difference between Treesitter and something like clang or cnip -- the latter tools parse the preprocessed source code (i.e., after macros have been expanded).

On the other hand, Treesitter parses the code without preprocessing -- this means it can expose some entities in its AST which are macros, but also it cannot parse code (correctly) that depends on the expansion of those macros.

I guess it is a case of: "you win some, you lose some".

ltcmelo commented 3 years ago

@geekrelief

Is dealing with macros, i.e. #define, out of the scope of --C-dump-AST ?

Yes, the AST is a syntactic representation where macros have been preprocessed. It's more like clang preprocesses them :-)

Yet, the parser of psyche-c will interpret syntactic-valid object- and function-like macros as objects and functions, respectively.