haxscramper / hcparse

High-level nim bindings for parsing C/C++ code
https://haxscramper.github.io/hcparse-doc/src/hcparse/libclang.html
Apache License 2.0
37 stars 2 forks source link
cpp libclang

Note: this project's development is temporarily paused due to my work on the https://github.com/nim-works/nimskull project. In the future I will come back to it, because I still think tooling like this is necessary, but there might be a delay in development for quite some time.


Note: work in progress - features and descriptions are largely accurate, but large chunks of intended functionality is yet to be implemented. To see the current state of development process please see [[https://github.com/haxscramper/hcparse/projects/2][alpha version project]]

This project provides two types of wrapper generators -

  1. Command-line application for rough translation of the C and C++ code to nim, including actual code translation (actual library implementation in addition to top-level declarations). Based on simple translation using [[https://github.com/tree-sitter/tree-sitter][tree-sitter]] for parsing and [[https://www.boost.org/doc/libs/1_76_0/libs/wave/doc/preface.html][boost wave]] for macro expansion.
  2. Fully automatic for handling extermely large libraries (like Qt), where any sort of manual editing is completely infeasible. Based on libclang and has full understanding of the code, but requires more sophisticated setup.

In addition to predefined wrapping logic API for user-implemented tooling is provided.

** Tree-sitter & boost wave

Command-line tool to either generate wrappers for C(++) code, or do full conversion of the project into nim. Based on tree-sitter and boost wave, and does not require complicated configuration to work. Is focused on first 90% of the wrapper implementation - remaining parts can be tweaked manually when initial wrapper generation is done.

** Libclang-based wrapper-generation

Libclang-based wrapper is not a finished command-line application like c2nim or nimterop, but rather a /framework/ for implementing custom wrapper scripts. It can be used as one-off tool that you can tweak manually, but it is mainly designed to provide fully automatic wrapper generators for cases where it is not realistically possible to do it by hand. Re-wrap whole Qt library on each patch release? Whole Posix API? That's what this project tries to give you. Sophisiticated tool for tackling complex wrapping problems, with built-in support for documentation, nep-1 style guide and comprehensive collection of automatic code generation tools.

It is an open secret that C and C++ libraries lack consistent styling, code policies and more. Sometimes exceptions are completely banned (or even simply unaccessible as in C case), different naming styles. Heavy reliance on the templates or OOP-style C++. All of that forces Nim wrapper authors to spend more time in order to provide higher-level interfaces that take advantage of the rich Nim features (~distinct~ types, exceptions, side effect tracking and ~enums~).

Hcparse provides a framework for adressing this problems in automated way, using user-provided or built-in tools, that allows you to

** Why have multiple different ways of wrapping libraries?

https://discord.com/channels/371759389889003530/371759389889003532/880807906335948840

Why is it necessary to have multiple different approaches to code wrapping? Having single entry point would make it much easier for new users, simplify documentation and explanation and so on.

Main reason for providing two solutions is very simple - each has its own downsides (for the end user), and it is not possible to create a tool where both techniques are used, as they have a large number of mutually exclusive requirements.

As you can see, each approach has its own powerful sides, but it is fundamentally impossible to merge two of them, since they have completely opposite requirements - one does not understand C++ code, and does not need to, while for second one it is absolutely mandatory. Manual wrapping was added for the sake of completeness, since implementation reuses the same IR.

** Difference from existing projects and approaches

Note: Main difference between other projects and hcparse is that they /already exist/, while hcparse is work-in-progress. For now, you can consider this section as an answer to more practical question - "why reimplement the already existing tooling?" and "how is it going to be different from the existing tools?"

NOTE: the project is still considered work-in-progress, but all the features mentioned above have already been implemented at least in proof-of-concept quality.

** Using hcparse as a library or writing own code generation tools

note: this section describes unstable functionality that might potentially be changed in the future.

[[./it_works.jpg]]

hcparse is built on top of several C and C++ code processing tools, specifically ~boost::wave~, ~libclang~ and ~tree-sitter~ C++ parser. Convenience wrappers for all of these libraries are provided as a part of hcparse library - full wrapper for the libclang API, C API for large section of the boost wave (not constrained to the C++ backed!).

In addition to the wrappers for lower-level C analysis tools ~hcparse~ also provides parse for the doxygen XML format (to be able to automatically port documentation without losing important semantic information).

Internal IR for the code is fully convertible to json (does not contain any lower-level details related to the libclang or tree-sitter processing), and can theoretically be generated using other frontends. Code generation facility can also be decoupled into separate tool that provides different features, or even generates code for the different languages if needed (note that original implementation is fully focused on nim, and as of right now there is no plans to make hcparse fully source and target-agnostic).