Closed timotheecour closed 5 years ago
c2nim can parse C++ code and has been used to wrap Urho3D, wxWidgets and Unreal Engine 4. It's true it's quite some manual work though.
with something like the clang parser you could likely create a pretty seamless experience. I thought about it in the past, but it would still be a lot of work (even though I've got a pretty nice clang wrapper already).
I didn't know nim2c c2nim also supported c++ (readme in https://github.com/nim-lang/c2nim only mentions C, we should fix that), however I played a bit with c2nim and threw at it some C++ syntax, there's many things on which it won't work well. Happy to file bugs if needed, however, my strong feeling is this can quickly become an infinite time sink (more precisely a 10-man year time sink, to quote Walter Bright in http://jkm.github.io/phobos/cpp_interface
Anecdotal evidence suggests that writing such is a minimum of a 10 man-year project
using clang as a library avoids having to parse/analyze C and C++ altogether (while guaranteeing 100% compatibility at least as far as generating llvm IR is concerned); I strongly feel this would be time better spent compared to trying to make c2nim better at C++. Same exact experience as with D's version of that, htod (+ similar tools), which unsurprisingly never got to handle automatically real C++ projects.
I've got a pretty nice clang wrapper already
cool which one? related to that, in https://nimble.directory/search?query=wrapper I find:
nim-libclang
Please use libclang instead
libclang
wrapper for libclang (the C-interface of the clang LLVM frontend)
however both these lead to just nim-libclang, seems like a bug
Difficulties I see with automated conversion have more to do with lack of proper mapping in Nim for some C++ features. I feel the biggest one would be
There are other ones, but probably not as common:
@timotheecour why do you call it "nim2c"? it's "c2nim" :)
To provide C++ interop, you need to understand C++ ABI - this ABI is non-standard, full of surprises, twists and turns and many years worth of legacy features. It would be an incredible time-sink, and what you'd get in return is a bug-ridden maybe-works-if-you're-lucky implementation.
Right now, the Nim compiler remains ignorant of any such issues and delegates them to the backend and ultimately to the underlying C/C++ compiler - it has no notion of an ABI beyond the very bare minimum, exportc and a few other patches of support. Neither the Nim language nor the compiler are equipped to handle C++ interop, increasing the amount of work needed to get a feature like this done.
Steps are being taken to get closer - at least to a point where the dramatically less complicated C ABI can potentially be covered (for example with the alignment work). If there was an MVP for this feature, it would be covering at least C interop, which remains the lingua franca and lowest common denominator of interop. It is not without reason that even LLVM itself offers a C api - C++ is simply unsuitable for interop without vast surgery or resources (failed examples include "managed" C++ of Microsoft fame).
D is in a different position in that it tries to be an explicit C++ replacement and match many of its features, with a smaller impedance mismatch.
all in all, I think the imperfect brute-force approach of c2nim, warts and all, serves nim better for the time being, until nim matures as a language and gains the features to even meaningfully talk about the problem. at least c2nim is honest in its limitations.
Neither the Nim language nor the compiler are equipped to handle C++ interop, increasing the amount of work needed to get a feature like this done.
Perhaps I am misunderstanding you but I think this statement is completely wrong. The Nim language and compiler is a duo that offers one of the most powerful C++ FFIs out there. Nim compiles to C++ after all, why are you saying that it's not equipped to handle the interop?
equipped to handle the interop
Nim compiles to C++ but it does not understand C++ - multiple inheritance, functors, ADL, template metaprogramming, even trivial stuff like destructors (yes, I know they're coming, and I'm guessing they'll be slightly off compared to C++) are all foreign to the Nim language. An import
statement has to understand all that and generate viable alternatives. Doable, sure, but a lot of work. And that's the VS6 subset of C++ - a lot has happened since then (try wrapping a random boost lib - that's a good example of what modern idiomatic c++ code looks like).
try wrapping a random boost lib - that's a good example of what modern idiomatic c++ code looks like
std::vector<T> or std::string
)/cc @dom96
one of the most powerful C++ FFIs out there
as described by @arnetheduck as well as in top post, this is very far from what you can do in Calypso, which doesn't require writing any wrapper, and understands a very large subset of C++ (enough to wrap C++ standard library, template heavy C++ opencv etc)
all in all, I think the imperfect brute-force approach of c2nim, warts and all, serves nim better for the time being
that forces users to write (tedious, bugprone) manual wrappers in many use cases, eg wrapping C++ only libraries like opencv (C support was dropped a while back)
c2nim uses hand-written parser (eg https://github.com/nim-lang/c2nim/blob/master/cpp.nim) ; I ran into too many issues when I tried c2nim
on C++ projects
parsing C++ is notoriously hard and a giant time-sink, which is why I'm suggesting here we explore an approach based on libclang
, as done in Calypso, to completely by-pass what's already done by libclang.
now that motivation and current limitations are hopefully clear, I'd really like to focus the discussion in this issue on how to bridge the gap and make C++ interop more useful
There are 2 ways I can think of :
auto-generated wrappers that understand C++ via adding Nim support to swig (see caveats https://github.com/swig/swig/issues/918 that I also mentioned above already; this could be discussed in a separate issue)
wrapper-free approach using libclang; that's the approach followed by https://github.com/Syniurge/Calypso and IMO the most promising one
it doesn't have to support all of C++ to be useful, ie, libclang will accept and parse correct C++ code, and cpp2nim
(let's call it like that) will filter out symbols it can't handle (ie, it won't necessarily halt on 1st symbol it can't map to Nim). Over time it would translate a growing number of symbols
Note that c2nim doesn't have this property, since parsing failures can affect future translations.
As the author of nimgen, I can agree that having a seamless interop with C/C++ would be great. I am a big proponent of code reuse and feel it is a no brainer to leverage established C/C++ libs.
If C/C++ -> libclang -> AST -> Nim
can be done transparently by the compiler, I'd be all for it. This is ideal since the compiler keeps evolving and c2nim is lower in priority and not in sync. Some of the code it generates isn't valid Nim anymore. Further, keeping wrappers backwards compatible with multiple shipping Nim versions is challenging.
I'm also on the fence on replacing c2nim's engine with libclang. The separate tool will still be out of sync and some of the challenges are managed by running code through the preprocessor - Nimgen makes that easy. The only advantage would be to translate C/C++ code completely into Nim but I'm not a fan of that beyond wrapping since you need to inspect every generated line and it isn't scalable for large projects. It also goes against my principal of reuse - translated code won't benefit from upstream improvements and bug fixes. Manual edits will be inevitable and I doubt automating everything through Nimgen will be viable.
Meanwhile, I will also say that despite c2nim warts and the long list of open issues, I have been able to wrap quite a few C++ libs, let alone C. Check out nimgraphql for a complex example. Given nimgen is automated, the wrappers also stay up to date with minimal maintenance effort. The other benefit is that all users need to do is nimble install wrappername
and everything just works so while the wrapper creation is a bit tedious, consumers have a seamless experience - at least that has been my goal with nimgen wrappers. What's sorely missing though is adding a Nim interop layer on top of wrappers.
Assuming current state of affairs, my near term wish list for the Nim compiler is as follows:
For c2nim:
For nimgen:-
Go ahead, write a prototype.
Consolidate all type declarations generated by c2nim into one chunk at the top - unless Nim compiler adds support for forward declarations outside the same type block
Nim supports {.reorder: on.}
.
Nim supports
{.reorder: on.}
.
Oh good reminder - https://github.com/genotrance/nimgen/issues/36. I will start using it in the future.
reorder is useful indeed for this task but needs a bit of love
One more item to Nim wish list to my comment above:
see also how AutoFFI iterates over all declarations parsed by libclang: https://github.com/AutoFFI/AutoFFI/blob/master/src/clang.cpp ; can't directly be used, but can be used for inspiration
https://ziglang.org/ btw uses libclang
to import C
stuff. could be a source of inspiration.
parsing C++ is notoriously hard and a giant time-sink,
haha, if you think parsing C++ is hard, wait until you get to the semantics part ;)
libclang has a Sema library to help with semantic pass: https://clang.llvm.org/docs/InternalsManual.html#the-sema-library
the bulk of the work would be to translate C++ semantic concepts (parsed by clang as AST with semantics attached, eg a TypeOf node) to Nim concepts; and that could be done gradually starting from the easiest (ignoring things that isnt' yet translatable), even on large complex programs
Well go ahead and start working on it. I have heard "c2nim is bad, I will write a better tool based on LLVM" from at least 3 people now. They never delivered...
If you use libclang, I advise using the C++ API. The C++ API is the real API, and the C API is a stable wrapper that barely scratches the surface area. It's a dead end.
The other problem we ran into is that pointers cannot be null in zig, but obviously in C they can, so we have to translate every C pointer as an optional pointer. Also in Zig pointers have the concept of length (single item, unknown-length), and we have to translate C pointers as unknown-length. So it has become a bit awkward to deal with C pointers, and there's a proposal to add a new pointer type to correspond to C pointers.
If you use libclang, I advise using the C++ API
.. or you can do like rust devs did and contribute to the C api such that it becomes less of a dead end. it's actually quite simple.
You can actually use a combination of the C and C++ APIs, I've done it and it works pretty well. The C API holds references to the C++ AST objects so you can query it for information that isn't available via the C API.
I've got a package that does this and I will open source it soon.
I played around with libclang for a while but it is huge and tedious to build. I'm not sure it is the best course of action.
In my quest to find a better way, I found tree-sitter which is a language parser built by Github for Atom. It supports over 18 programming languages and parses them into a common AST format which is then being leveraged for syntax highlighting and code folding among other possibilities.
I've gone ahead and wrapped it using c2nim/nimgen and it works as expected for these 18 languages. I'm now looking into the AST format to see how it can be leveraged in our world.
If it becomes possible to convert this AST into Nim code, it will become possible to convert code from all these languages into Nim. Of course, have to be realistic - it may not be the case that there's a 1:1 mapping for every construct but it certainly seems interesting. Moreover, I'm interested in wrappers so I'm not as motivated to convert everything into Nim yet, just C/C++ headers into definitions that Nim can immediately leverage.
The question in my mind is whether this wrapper interop can be done at compile time - given tree-sitter is C code, it would have to be built into the Nim compiler to do that. Without that, it would end up being equivalent to c2nim. I cannot see a way to create a library that adds this capability via macros since the VM cannot importc at compile time.
Finally, it will be super cool to have a Nim grammar as part of tree-sitter so that existing Nim source code can be parsed and supported just as well as these other languages. I hope someone takes on that effort.
I've started nimterop which builds tree-sitter into a binary and then converts the ast into Nim using macros. It's working pretty well so far and @timotheecour has also been making contributions.
I'll appreciate a review of the approach and any feedback to ensure this thing has legs. Again, my goal is only wrappers and not outright conversion of C/C++ to Nim so the scope is limited at this time.
I'll appreciate a review of the approach and any feedback to ensure
my take would be that I prefer the wrapper gen to output a nim file for several reasons (instead of it all being hidden behind macro magic:
I use c2nim currently to import the llvm-c
headers - results can be seen here - it's a fairly extensive library where I'm unable to import some functions due to macro expansion not being sufficiently supported by c2nim
The question in my mind is whether this wrapper interop can be done at compile time - given tree-sitter is C code, it would have to be built into the Nim compiler to do that. Without that, it would end up being equivalent to c2nim. I cannot see a way to create a library that adds this capability via macros since the VM cannot importc at compile time.
Of course it can. Keep in mind that the VM can execute processes easily, this enables pretty much anything. You can write a small binary that takes as input a .c/.cpp filename and output a JSON-formatted AST, parse that in your macro and you've got wrapper-free interop.
Now that I think about it you can probably just run clang
to output the AST as text and parse that. No need to use tree-sitter which I doubt comes close to clang's C++ parser.
@arnetheduck - thanks for the feedback - agree with all your statements.
header files by definition rarely change - rerunning tree-sitter and friends on every compile is wasteful
Agreed - design is moving towards this. Right now, the code is run on every compile since it is still POC grade but generating a .nim file is where we will end up.
having a separate binary to distribute in order to compile your application makes it complicated - much easier to commit a generated nim file
My approach with nimgen has been to make the wrapper process seamless for a consumer. I like how I can simply nimble install
any package and start using it - native Nim or wrapped. Any details are handled by the package itself - git clone, wrapping, paths, etc. However, it should not stop anyone from checking in the generated files.
@dom96 - nimterop already does a whole bunch using macros including creating the AST data structure so macros are really capable of anything! That being said, I'm working on moving most of the functionality into the binary since the VM is slower and this method involves C => tree-sitter AST => string => macro AST => Nim code which is round about. It will also work better standalone to meet @arnetheduck's use case.
Now that I think about it you can probably just run clang to output the AST as text and parse that. No need to use tree-sitter which I doubt comes close to clang's C++ parser.
I'd have loved to do this but clang isn't the default on Windows or Linux. Downloading several hundred megs to just do the AST generation will be a showstopper for most.
My approach with nimgen has been to make the wrapper process seamless for a consumer.
yeah, good point about the "get-up-and-running-seamlessly" and the package doing all this - I just don't think we're there yet, with nimble :) ie nimble install
is conceptually broken - putting what should be repository-local information in a shared/global location does not scale.
you're absolutely right that if all other stars were aligned, checking in a generated file would be deeply questionable. an additional argument to do so anyway might be that it removes the need to have the header files installed at all - a common-enough situation given that on windows you usually get just a dll, and on linux you need -devel
packages - I'd argue that a more "complete" solution would download the dependency, build it etc etc, but then we have a full-blown package manager. not necessarily a bad thing, just a much larger scope - or at least not the job of the wrapper generator :)
IMHO this "don't commit generated files" is a huge fallacy that indirectly produces stuff like
https://gcc.gnu.org/wiki/WindowsBuilding
where you need make, perl, flex, bison installed with the right version or else you can't build it.
I'd have loved to do this but clang isn't the default on Windows or Linux. Downloading several hundred megs to just do the AST generation will be a showstopper for most.
That's because you are looking at this problem from the wrong angle, sorry to be blunt. Ship the generated Nim code.
I'd argue that a more "complete" solution would download the dependency, build it etc etc
@arnetheduck - so that's what I have done for most nimgen wrappers - download the upstream sources via git or zip, generate wrappers and then compile in all the sources, no binary required. And none of this is the end user's problem. You simply nimble install
and import X
. The result is a single binary with no external dependencies.
There are some wrappers where this isn't possible (nimbass) or painful (nimssh2) but nimgen supports compiling in, DLLs and static lib scenarios.
not necessarily a bad thing, just a much larger scope - or at least not the job of the wrapper generator :)
I've taken this on in nimgen to an extent since I feel it is crucial to make the process seamless. It makes the nim ecosystem that much richer and easier to get started in. It is tedious up front but I've done 25 wrappers myself. Every package is tested with 0.18.0, 0.19.0 and devel daily with the latest upstream changes so anyone can use them at any time.
IMHO this "don't commit generated files" is a huge fallacy that indirectly produces stuff like That's because you are looking at this problem from the wrong angle, sorry to be blunt. Ship the generated Nim code.
@Araq: I know we discussed this a bit earlier on IRC but as asked before, there's too many moving parts:
I can check in a bunch of Nim files as a Nimble package but they are a snapshot in time. Anything outside that tested combination and stuff may not work or meet the consumer's requirements. You see so many Nim wrapper packages in this situation which were purpose built by the consumer for a project, not built as a sustainable package.
This approach works for a consumer - install at a point in time and save that combo of everything and maintain it in source control until there is a need to upgrade. It does not work as a library maintainer who has to cater to any variation of the above. It is not a seamless experience in just a few months and not scalable if you have to generate and maintain an archive of combinations.
Nimgen doesn't solve everything but allows me as a package maintainer to keep things up to date. If a consumer wants a snapshot, they can and absolutely should make one. These aren't reproducible builds though, not yet.
If there's a way to solve this in a scalable fashion, I'm all for it. That being said, my primary goal is to make it easy for others to use these packages and transition into Nim. I don't believe making it static is going to achieve that.
Here is the workflow I assumed your nimgen
uses, and it still seems feasible to me. Feel free to mentally replace c2nim
by a better tool.
#ifdef
and #define
etc.There was a long discussion on IRC about this topic. The list of concerns with the nimgen approach are the following:
#define XYZ
for crucial consts needs to be more robust
gcc -E -dD
can help pull relevant platform specific definitions - seems to be working in limited testing#define funcn2(x, y, z)
for function signatures is not possible today
Thanks for all the feedback so far. Meanwhile, we are continuing to work on nimterop and will port relevant improvements into the nimgen workflow accordingly.
Quick update on this issue - nimterop has been growing in functionality over time. The road is still long but the current status makes me optimistic.
I think it is time to close this issue since Nim provides all the infrastructure required to pull off this interop without compromise or requiring any fundamental changes to the compiler. Minor details have been discussed in other issues and have provided a good direction to continue on.
I encourage the community to continue providing your guidance and feedback to ensure development continues in the right direction.
Having a good C and C++ interop for Nim would be of strategic importance for wider adoption of Nim, as it would allow reusing the massive code bases out there (eg opencv, qt, SFML, ...) without having to either rewrite them or writing and maintaining wrappers.
Calypso
https://github.com/Syniurge/Calypso, a fork of ldc compiler for D, is an amazingly cool project that allows direct interface between D code and C or C++, without using any wrapper, any (because it uses clang and llvm), understands virtually all of C++ (including pre-processor, C++ templates, exceptions, etc).
A C++ class (eg opencv
cv::Mat
) or functions/templates can be used in D directly afterimport (C++) cv.Mat;
without need to write or generate wrapper code for these, and templates don't need to be instantiated in order to be used: they can be used directly. We can pass/return by value, pointer, or reference, we can even derive C++ classes in D etc.Here's a simple example importing Qt in D: https://github.com/Syniurge/Calypso/blob/master/examples/qt5/qt5demo_simple.d Thanks to calypso, I was able to use some opencv functionality from D in non-trivial use cases involving heavy use of C++ features and it actually worked (modulo some bugs that have been fixed since then for the most part).
Nim interop
On the Nim side, we can embed C or C++ code as follows:
however the hard part is writing the wrapper code (especially for larger C API's or any C++ API)
For C projects, c2nim can be used but it's not based on a full C frontend (eg clang) and can quickly run into limitations, eg see CREATING A NIM WRAPPER FOR FMOD which shows a number of manual extra steps had to be employed to wrap C library FMOD.
For C++ projects, there's currently no way to automatically generate wrappers/bindings, one has to resort to tedious manual mapping of C++ classes, taking care of manual allocation/deallocation of C++ classes in Nim code, no pass-by-value causes a performance hit, etc.
is a calypso-like approach for Nim feasible in the short/medium/long term?
/cc @Syniurge @arnetheduck @Araq @dom96 I'm curious what are your thoughts on that. Nim compiles to C, C++, and some other backends (objc, js). @arnetheduck wrote nlvm, a LLVM-based compiler for Nim which could be used as a starting point (it provides the glue layer between the AST (produced by Nim compiler) and LLVM, replacing the C output with LLVM bitcode. It uses the llvm-c interface (source)
At the end of the day, I'd love to be able to just write:
notes
links