nim-lang / c2nim

c2nim is a tool to translate Ansi C code to Nim. The output is human-readable Nim code that is meant to be tweaked by hand before and after the translation process.
MIT License
509 stars 63 forks source link

Option to use C preprocessor to expand C macros #255

Closed elcritch closed 1 year ago

elcritch commented 1 year ago

Draft PR, but wanted to show off a cool bit!

Running the CC preprocessor and splitting out just the bits related to the input file

./c2nim --preprocess -I:$RD/rcl/rcl/include/ -I:$RD/rcl/rcl_yaml_param_parser/include -I:$RD/rcutils/include/ -I:$RD/rmw_fastrtps/rmw_fastrtps_cpp/include/ -I:$RD/rmw/rmw/include/ testsuite/cextras/rcl_arguments.h

Transforms the following:

RCL_PUBLIC
RCL_WARN_UNUSED
rcl_ret_t
rcl_arguments_get_param_files(
  const rcl_arguments_t * arguments,
  rcl_allocator_t allocator,
  char *** parameter_files);

To the final C source with all the macros expanded but still retaining the C defines, comments, etc:

__attribute__ ((visibility("default")))
__attribute__((warn_unused_result))
rcl_ret_t
rcl_arguments_get_param_files(
  const rcl_arguments_t * arguments,
  rcl_allocator_t allocator,
  char *** parameter_files);

Ideally this lets me import more of the crazy C system headers.

Araq commented 1 year ago

One can always preprocess manually beforehand, no need to hardcode "gcc" into c2nim. Also, I really don't like the idea as the C code in its non-preprocessed form contains more information. We also don't strip comments from the C file before parsing and now see what you did with the comments.

elcritch commented 1 year ago

One can always preprocess manually beforehand, no need to hardcode "gcc" into c2nim.

The goal wouldn't be to make this the default, however, it can reduce the expertise/work required to use c2nim to wrap C headers, IMHO. You just point it to the libraries include folders. Then ideally you can spend the saved time writing nicer Nim-ified wrapper layer.

True, it'd be possible to run this manually. It was my first thought. However it's somewhat fiddly to strip out the code from the includes and get it running properly and dealing with multiple files, etc. Plus having access to the lexer would make it possible to fixup things like the comments issue you mention (e.g. replace comments with the original form).

Having it in c2nim makes it trivial to use. Though the CC call ("gcc") and the other options configurable before this is out of draft status. All the major compilers support a "-E" flag.

Also, I really don't like the idea as the C code in its non-preprocessed form contains more information. We also don't strip comments from the C file before parsing and now see what you did with the comments.

Sure the non-preprocessed form generally has more information. However, it doesn't work for a lot of cases that I've run into as the parser can't understand macro-obfuscated function attributes. Technically the non-preprocessed form doesn't have to be valid C syntax either. So the non-preprocessed code requires the macros to be expanded which involves a lot of manually importing or creating the macros in c2nim.

The non-expanded form often also obfuscates the details needed to wrap the actual C api. I saw that a lot in Zephyr. Reverse engineering the layers of C macros to figure out the actual types was a pain.

Araq commented 1 year ago

However, it doesn't work for a lot of cases that I've run into as the parser can't understand macro-obfuscated function attributes.

Do you know about #def?

elcritch commented 1 year ago

Do you know about #def?

Yah, it’s handy for small projects but it doesn’t scale. You just end up having to define too many of them for larger projects. Also they don’t work well for recursive macros or other edge cases.

Currently I’m already at about 50 headers or more trying to wrap ROS core. Using the C preprocessor feature has let me import them with minimal manual defines. I’ve got a bug or two to fix on re-rendering the directives, but otherwise its been fairly seemless.