Considering rewriting cimgui?

ocornut / imgui

Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies

MIT License

61.38k stars 10.33k forks source link

Considering rewriting cimgui? #8114

Closed ocornut closed 3 weeks ago

ocornut commented 3 years ago

I believe we should help push cimgui forward. It has been immensely useful but its showing its age.

I wonder if we can adopt a different strategy for it, and/or just rewrite it sanely with a third of the code. Ultimately we don't want to maintain it - if there is maintenance to do - but right now it is stuck in a clunky spot.

Generally cimgui creates quite some friction. People at the end of language chain (e.g. imgui>cimgui>rust or imgui>cimgui>c#) tends to stay on master, have difficulty using dear imgui because their api file are not commended, makes it harder to distribute sources because some part of output are compiler-dependent.

The way cimgui works is that it parses imgui.h and imgui_internal.h and then generates:

(1) in cimgui/generator/output/, a bunch of metadata (json/lua) about Dear ImGui apis supposedly allowing to generate other bindings
(2) in cimgui/, the C wrapper code.

I am assuming that (2) is generated from (1) but it could be that the generator generates both simultaneously, and (1) doesn't contains all infos to generate (2).

PROPOSAL

Attempt to fix the issues listed below (ranked by importance A to C)
Make is easier to use for people.
If it takes rewriting it from scratch let's do it. If there is a sane way to fix existing cimgui without rewriting it I'm all for it. It may be less fun than starting from scratch but if we feel there's a way to do that why not...

I feel there is probably a way one can rewrite that mess in a few days into a neat little package, perhaps with less dependency (not even dependency on compiler?). Maybe it won't have all the details ironed out, but starting from a sane codebase may be good. Either it could be a manually crafted parser (we only need to parse imgui's .h files), either using clang ast of some output to extract the data we need.

ABOUT UPCOMING IMSTR/STRINGVIEW APIS

We will want for ~1.82 to adopt the string_view branch: https://github.com/ocornut/imgui/compare/features/string_view This will be desirable because, from most-important to least-important:

Rust literals are not zero-terminated which is problematic, we're currently losing Rust people (many skilled/innovators) because of it.
For most language bindings and even std::string users we could remove extraneous strlen() everywhere.
Bonus: Pave a long-term path toward using wchar

However this come with a problem for cimgui:

Users of cimgui from raw C will want to use 'const char*' in their end-user API (and the thin C>C++ wrapper function can add the strlen)
Users of cimgui from language bindings will want to use the dual const char* mode.
So cimgui we need two forms of output or maybe two always-available apis?
This need to be done without making it more difficult for people using prebuilt cimgui, prepackaged libraries, custom branches, so always outputting both api may be the best thing to do. Maybe the single char* version can be inline/macros even...

ISSUES

A. Output "cimgui.h" has no comments, nothing aligned. Output is generally extremely ugly to look at.
A. internals are in the same file instead of being a cimgui_internal.h, generating internal is an opt-in feature which adds stuff to same file :(
A. From paragraph above. Somehow find a way to support ImStr in char mode (for raw C users) and in dual char mode (for language bindings)
B. Ideally the generation process discussed above should be able to generate the C>C++ bindings from the metadata.
B. Build process is not obvious, making it hard to use with branches.
- Generally needs to be improved.
- Feels like "editing generator.bat" just gets in the way.
- Feels like repo having submodules gets in the way.
- Feels like the weird code duplication for other projects (e.g. cimgui_plot) gets in the way.
- Auto-generation for branches would alleviate some of that.
- Ideally it should be just e.g. "sometool.exe input_folder output_folder", done, could work on multiple librairies. Absolutely ok ihmo, and likely/probable that we embed custom rules/hacks as cimgui done.
B. It requires the target compiler, supposedly only used as a preprocessor... which makes it hard to prebuild and share cimgui output. Some stuff are pre-processed per compiler at the cimgui generator level (instead of leaving #ifdef in the cimgui output code)
B. Consider the possibility to generate straight bindings to other languages without going through C functions.
B. Consider that the metadata used by the generator could be leveraged for other stuff, like an in-demo interactive documentation generator?

Thoughts? Open to other ideas..

NOTES I CAN REPORT TO CIMGUI TODAY

Doc: mention of cimgui_nopreprocess. cimgui_impl. in generator? can't see those? obsolete?
Doc: code mention of "--script for auto_funcs.h and auto_funcs.cpp generation"? can't see those? obsolete?
Doc: give concrete example of cmake options for casual cmake users.
Use IMGUI_HAS_DOCK where it should use IMGUI_HAS_VIEWPORT

rokups commented 3 years ago

Clang 10 (or maybe 11) added a support for dumping AST as json. I am working on a python script to generate some stuff from C++ API for myself. No reason this cant be used for generating a C wrapper. It also provides comments.

ShironekoBen commented 3 years ago

I've started to look into this - it definitely looks like something we could do a better/more self-contained job of. Things that have sprung to mind thus far (half questions, half just random thoughts that comments on are more than welcome!):

I'm curious to know if we have any idea how many people are using this as an actual C library vs as a means of binding to higher-level languages. That's potentially relevant in terms of how to prioritise "generating idomatic C" vs "generating stubs that are easy for other things to consume" (for example, overloads vs default parameters, or if flattening enums to their calculated values is a good idea), and I have a feeling that the number of people amongst the ImGui audience these days who are writing "genuinely C" programs (as opposed to "writing C-style code but in a C++ environment and thus able to use C++ libraries with minimal pain") is fairly low. I have no evidence for that feeling, though!
On a similar note, I'm wondering how much backwards-compatibility is needed... the actual C output feels like we could probably keep reasonably close to (for people who are hand-coding against it), but the generated JSON/LUA files are both a bit of a PITA and don't look all that straightforward to parse if you're trying to use them for a non-C-like language.
Is preprocessing the source actually desirable...? On one hand it removes complexity and makes language bindings easier as that layer doesn't need to care, but on the other it reduces readability a lot and means that a given generated header is completely tied to the set of #defines used when it was generated. I feel like we may well be able to do a better (read: nicer to look at the results of) job by not preprocessing and manually handling preprocessor directives as needed.
Do we have any preferences for implementation language? I definitely think as a goal something as close as possible to "here's a single executable that does it all" is good, but at the same time I have this instinct that says "this would be much cleaner and neater to write in Python (or even C#) than trying to do it in C++ with all the messy string manipulation/etc", even if the side-effect is that "single executable" does require the user to have another environment installed.
Semantics for comments would be really nice, as it would mean we could produce idomatic comment blocks in the target language along with the bindings (think JavaDoc and the like).
Similarly, keeping the object structure in the generated metadata would make it easier for a suitably smart backend to reconstruct the object structure automatically for OO languages and give the user something that was a decent facsimile of the original C++ API.
In an ideal world I think I'd like to write at least one "backend" (i.e. language binding) to go with this, partly to prove that works and partly because it will serve as a template for anyone else doing the same.

ocornut commented 3 years ago

Thank you Ben for investigating this.

(1) There's definitively a little crowd/trend of people writing C e.g. https://github.com/floooh/sokol I'm not sure how we could quantity it, I could do a call on twitter if it's useful to quantify or qualify that crowd? I think the general priority in term of product "quality" should be binding to higher-languages crowds but raw C should be functional, if only because those higher-languages are likely to rely on the C version anyway.

(2) Backward compatibility of the metadata doesn't seem important to keep. If we step forward into a v2 generation for bindings and write one-two of them, binding maintainers will follow. Right now I even suspect cimgui's facilities haven't been used enough by back-ends, several back-ends are still done manually or out of date, I think a more attractive solution would help.

(3) My gut feeling is that preprocessing is not desirable, afaik that's the only reason cimgui depends on compilers and I suspect it was a mistake on cimgui end to rely on preprocessor but it was probably done to simplify lexing/parsing a little bit, maybe to handle some edge cases ifdef for some types, or stuff like NULL or FLT_MIN? (tho cimgui still seems to handle e.g. FLT_MIN manually).

(4) As part of the general "make it easier to run/use" choice would be important. I think it would need to be at a nice crossroads between "being readily available on osx/linux and CI machines" and "being easy to pull/install on windows". C++ would be ideal in term of availability on host, but may or not be worth the extra coding work. Python or LuaJIT would probably be ok. C# I don't know how easy to access on Linux, Mac or alternative Windows toolchains such as MinGW. None would be a strong deal-breaker but I feel that choice has the potential to facilitate or hinder adoption.

Yes to 5-6-7 👍

ocornut commented 3 years ago

Might be of interest: https://blogs.windows.com/windowsdeveloper/2021/01/21/making-win32-apis-more-accessible-to-more-languages/

ShironekoBen commented 3 years ago

Ooh, that's pretty cool stuff, although pretty heavyweight if considered in isolation. If it becomes more of an ecosystem then definitely worth considering though!

Thinking about it a bit further, on the language front I think Python may be the way to go... in my head it's a tradeoff between C# which "just works" on Windows but can be a bit of a PITA to set up on OSX/Linux vs Python which is pretty much always installed by standard on OSX/Linux these day and is only a relatively simple install step away on Windows.

I've started (slowly...) putting together some code to try things out - mainly just parsing the header files and generating intermediate data structures at this point. My thinking is that stealing a trick from LLVM in terms of architecture seems sensible - basically have an input layer that builds a tree of nodes that correspond to the structure of the original file, then a number of transformations that take those nodes and do things with them to make them closer to the desired output ("lowering" in LLVM parlance), and finally an output layer that walks the nodes and writes out the target language bindings.

This splits things up nicely, and gives an easy way to drop in transforms for (e.g.) "flatten all references to enum values into their actual numeric forms" or "split up functions with default arguments into multiple overloads" as the target requires.

rokups commented 3 years ago

On a similar note, bgfx uses idl to generate APIs. This includes c/c++/c# and few other interfaces.

https://github.com/bkaradzic/bgfx/blob/master/scripts/bgfx.idl

I've started (slowly...) putting together some code to try things out - mainly just parsing the header files and generating intermediate data structures at this point.

Did you look into clang -dump-ast=json?

ocornut commented 3 years ago

Quick thought: since one of primary pressure for this is to get Rust integration without Rust users having to enclose all literals in macros, I imagine that given a wip version imgui-rs people may be happy to collaborate on this, and C + Rust may be the initial test target, if any. From our POV we are only going to develop the C wrapper but there is that user base who would have vested interest in helping polish it.

ShironekoBen commented 3 years ago

Hm... so "properly" parsing mixed preprocessor and C++ syntax is a PITA. I mean, C++ syntax on its own is a hilarious PITA - someone literally wrote a Doctor's thesis on bending an LALR(1) parser so that it can just about parse C++! - but when you have the preprocessor in the equation as well it's kinda nightmarish... cutting back and not supporting certain constructs that ImGui doesn't use helps a lot ("helps a lot" here meaning "makes the problem not actually a pitch-black abyss of impossibility"), but it's still resulting in a lot of mess. I've got something that can parse a reasonable percentage of imgui.h but I'm starting to doubt that brute-forcing on through to fix the remaining stuff is actually going to yield something that isn't hilariously brittle in weird and unexpected ways due to the sheer volume of special-case rules.

So I'm currently thinking that maybe going back and splitting the problem in two might be an idea. Basically, my thought process is this:

1) Generating C bindings is "comparatively" easy to do as a text manipulation operation (as the existing cimgui shows), because you can just blindly copy whole chunks of stuff and the compiler will deal with it later. This is also the only situation in which keeping #ifdefs around makes sense, because C is the only language that will actually be able to make use of them.

2) Generating good metadata and bindings for other languages requires you to understand the code a lot more (actually parse type references and the like), but doesn't benefit from any attempts at preprocessor-preservation because by definition other languages will be linking against a library compiled with a fixed #define set. So in that context it may make more sense to use clang or similar to generate the input metadata, thus avoiding all of the unpleasantness with parsing (or special-casing) painful stuff like IM_DELETE().

Still running through this in my head a bit, especially how to do step 1 in a sensible way, but I'm leaning more towards this idea ATM.

PathogenDavid commented 3 years ago

Hello all! Apologies for the wall of text, I tend to ramble a bit. Plus generating bindings for C/C++ libraries has been my main thing for a for over half a year now so I'm chock-full of thoughts on this stuff.

Omar asked me for my thoughts on this issue, I'm the developer of Biohazrd, a C# framework for helping write binding generators for arbitrary C/C++ libraries. (Apologies for the scant documentation, the framework is still fairly young.) I strongly believe Biohazrd's goals align with this issue's goals and it'd be a good candidate for realizing these goals.

For some ~~quick~~ background: I originally wrote the proof of concept that would become Biohazrd back in July with the goal of providing access to NVIDIA PhysX directly from C#. I've since rewritten the majority project and improved it significantly thanks to significant sponsorship by NeuroGEARS. It's now been used to successfully generate C# bindings for NVIDIA PhysX, DXGI+DirectX 12, the classic Win32 API*, NVIDIA TensorRT, OpenCV**, and of course Dear ImGui.

Most of these are not public currently. Willing to share or demo them privately though!
I do want to note that I haven't touched the InfectedImGui generator in a while. It's a bit out of date in regards to both ImGui and Biohazrd. In particular this is why the generated code lacks namespaces.
*I only do this for a subset of the Win32 API, but only because the Win32 API is pretty old and janky so as I add functions I like to manually inform the generator about things like a UINT really being an enum.
**I will note the OpenCV bindings are nowhere near fully usable yet, it's a library that's particularly problematic for non-C++ consumers.

I made Biohazrd after years of manually writing bindings and working with other library-specific generators like the one used for cimgui or SharpDX. I had lots of motivations for creating Biohazrd, but three of the big ones were:

I wanted to experiment with interacting with C++ libraries directly in C# without a C interop layer.
I was frustrated by the fragility of solutions like cimgui. (As Omar notes, cimgui and ImGui.NET tend to keep people on an outdated version of Dear ImGui.)
I wanted to greatly simplify the process of authoring and maintaining generators like cimgui. (As you all have noted in this thread, parsing C++ even with the help of Clang is quite the ordeal.)

A quick 1000ft overview of Biohazrd's philosophy and design:

C layers should be unnecessary.
The core of the library should not know about C#. (IE: Allow generating Rust or C bindings too.)
Generator authors should not need to know how to interact with Clang's AST, but they should be able to if they want to.
Biohazrd should not try to generate "safe" bindings by default. (IE: Without additional work, Biohazrd libraries almost always require of unsafe code in C#. This is because...)
Humans can make generalizations about a C/C++ library that Biohazrd never could. (A good example of this is Dear ImGui's enums using typedefs. In InfectedImGui, making these nicer for C# consumption is handled by ImGuiEnumTransformation.)
- Another good example of this: A human can determine that all ImStr functions should be wrapped to take const char* for a C library, should perform the relevant work needed to convert std::string::String to ImStr for Rust, and perform the relevant UTF16 to UTF8 conversion for C#.
It should be easy to generate bindings for an arbitrary version of the library and customize them. (IE: Effortless or near-effortless switching to the latest docking branch or generating custom bindings that use a custom two-float vector type instead of System.Numerics.Vector2.)
Clang is the ultimate source of truth. Make as few assumptions about things like memory layout and ABI as possible. (This has a negative side effect in C# as the generated C# being OS and architecture-dependent. I have plans to mitigate this issue, but they have not yet been realized.)

Biohazrd operates in a handful of discrete stages:

Translation - Biohazrd uses Clang to parse the C++ headers and constructs a simplified view of all the declarations. (This view is called the declaration tree.)
Transformation - Individual transformations are applied to adapt C++ concepts to the target language and apply those human-determined generalizations. (Some examples include converting C++ reference types to pointers, converting loose macros to C# enums, converting global variables to static properties, etc. There is the main extensibility point of Biohazrd.)
Verification - A final pass is done to the declaration tree to ensure that everything can be represented in the target language. (IE: If you skipped the C# transformation for converting global variables to static properties you'd get a complaint here and the variable would be removed since C# can't represent loose global variables.)
Output emit - The declaration tree is converted to source code in the target language. (IE: C#, Rust, C, some special metadata format, etc.)

I wanted to explicitly address Omar's concerns for a theoretical cimguiv2 as they relate to Biohazrd:

A. Output "cimgui.h" has no comments, nothing aligned. Output is generally extremely ugly to look at.

I'm of the opinion that even if hardly anyone ever looks at it, it's still important for generated code should still be human-readable for debugging purposes. As such Biohazrd generates code which strives to look like it was written by a human. (Arbitrary example: Generated code for ImGuiListClipper)

As for comments, I do not emit any yet but I am tracking the feature as https://github.com/InfectedLibraries/Biohazrd/issues/20. As noted earlier in this thread, the comments are easy to access via Clang, I just haven't gotten around to it. (Primarily because C# has a very specific structure for documentation comments, so I wanted to parse Doxygen-style comments for use with that. For ImGui's simpler documentation this would be unnecessary.)

A. internals are in the same file instead of being a cimgui_internal.h, generating internal is an opt-in feature which adds stuff to same file :(

Handling this sort of thing more intelligently is trivial in Biohazrd. For example, I use the file names associated with declarations to place DirectX APIs in a relevant namespace, and I later even use that information to logically separate the APIs into separate C# assemblies (DLLs) so that DXGI things are logically separate from DirectX 12 things.

I don't translate the contents of imgui_internal.h today, but if I did I would either emit the internal APIs into their own namespace (IE: InfectedImGui.Internal) or a separate assembly so that they're harder to use by mistake.

A. From paragraph above. Somehow find a way to support ImStr in char mode (for raw C users) and in dual char mode (for language bindings)

As noted earlier, Biohazrd could easily emit different handlings for ImStr vs char*. For a C library, I'd likely emit the raw ImStr functions into their own header since direct C consumers generally never need/want them.

(I think in general if you go with the C layer library approach you should generally have two versions regardless since something that's friendly for C consumers looks very different from something that's friendly for C#/Rust/Lua/etc consumers.)

B. Ideally the generation process discussed above should be able to generate the C>C++ bindings from the metadata.

Biohazrd's philosophy is explicitly not to write out metadata or use metadata for generating bindings. The primary reason for this is capturing the full fidelity of the Clang AST in serialized form is basically impossible. In theory Biohazrd could emit the declaration tree as JSON or something similar, but I don't do so today.

B. Build process is not obvious, making it hard to use with branches.

Generally needs to be improved.

Feels like "editing generator.bat" just gets in the way.

Agreed. I used to maintain a private fork of cimgui and ImGui.NET specifically targeting the docking branch and it was a pain.

Feels like repo having submodules gets in the way.

Most of the Biohazrd generators I've authored today reference their target libraries via submodules, but this is only done for simplicity rather than necessity. As noted earlier my long-term plan is to eliminate this need.

Feels like the weird code duplication for other projects (e.g. cimgui_plot) gets in the way.

Auto-generation for branches would alleviate some of that.

Agreed. Even without the "Plug in a SHA1 and repo URL and it just works" feature, I planned on having CI that automatically pulls everey commit to ImGui and generates bindings for it so problems can be identified ASAP.

(In general Biohazrd is designed to be very defensive. So if it sees something that makes it uncomfortable, such as something that doesn't translate well to C#, it will tell warn/error at you and emit a best-effort translation regardless.)

Ideally it should be just e.g. "sometool.exe input_folder output_folder", done, could work on multiple libraries.

This is basically how the InfectedImGui generator works today. If published as self-contained, a Biohazrd generator could be run with 0 dependencies.

The InfectedImGui generator does require CMake to build ImGui as a DLL, but this is a dependency I'd like to eliminate in the long run. One of my very long term plans for Biohazrd is to allow writing C++ in a C# project and have it automatically generate bindings and build it to a DLL. I have a very rough proof of concept of this using a minified version or Clang and lld to prove it isn't too unrealistic.

Absolutely ok ihmo, and likely/probable that we embed custom rules/hacks as cimgui done.

I call these transformations in Biohazrd. They're the code manifestation of those human generalizations I mentioned earlier.

B. It requires the target compiler, supposedly only used as a preprocessor... which makes it hard to prebuild and share cimgui output.

I'm not sure if I 100% understand what this is a bout, but as noted earlier Biohazrd embeds the subset of required Clang functionality so it doesn't care if you have a C++ compiler installed.

Some stuff are pre-processed per compiler at the cimgui generator level (instead of leaving #ifdef in the cimgui output code)

This is somewhat possible with Biohazrd, but the current idea is you should surface a particular configuration of the native library. There are a few reasons for this:

Using preprocessor for configuration isn't used the same way in C# as it is in C/C++
The C# wrapper has to describe bindings for a pre-build C++ DLL so you couldn't change things even if you wanted to
Features like IMGUI_DISABLE_OBSOLETE_FUNCTIONS should be surfaced using more familiar C# functionality instead. (IE: The [Obsolete] attribute.)

B. Consider the possibility to generate straight bindings to other languages without going through C functions.

As noted earlier this is one of the fundamental goals of Biohazrd with relation to C#. I did some preliminary research and determined it should be very feasible with Rust as well.

The biggest limitation here is that the vast majority of languages assume that all FFI will be done using C calling conventions. C++ calling conventions are not that different from C, but you do have to worry about them.

Because of this, Biohazrd requires special knowledge of C++ calling conventions, and as such currently only supports Windows x64. NeuroGEARS is sponsoring Linux x64 support and possibly Linux ARM64, but neither are supported yet. I've kept track of the Linux x64 C++ ABI differences so adding that shouldn't be too hard (it's only slightly different from Windows x64), it just hasn't been a priority yet.

My plan (ironically enough) if there was enough demand for not having to worry about ABI things was to generate a C library layer since C++ compilers are obviously already capable of handling these ABI concerns in extern "C" functions. It just creates yet another layer to worry about. (IE: Safe C# -> Unsafe C# -> C -> C++)

ABIs are complicated and as you might guess that when Biohazrd gets things wrong it's very frustrating to debug. The eventual plan to mitigate this is automated ABI testing. (https://github.com/InfectedLibraries/Biohazrd/issues/32) The basic idea is I want to have Biohazrd generate a C++ library with everything stubbed out, then generate C# unit tests which call every function/method in the entire library and make sure all the data survives the journey in both directions.

In the case of ImGui specifically, ABI verification could in theory be done by adapting the tests in imgui_dev. Although you'd likely want to modify the all/most of tests to be represented in some language-agnostic form which is mechanically converted to C++/C/C#/Rust/whatever. Or you could just write them in C# or Rust and assume if those work it's definitely gonna work in C++ too. (This is obviously a bit of a wild idea, but I wanted to throw it out there.)

Of course ABI concerns also just go away as more people use the bindings. These ideas have mostly grown out of concerns for validating behemoth libraries like OpenCV, but I wanted to mention them here for anyone curious.

ImGui's API is so clean, consistent, and C-like that I'm not really even concerned about ABI problems in the first place.

B. Consider that the metadata used by the generator could be leveraged for other stuff, like an in-demo interactive documentation generator?

As noted earlier Biohazrd doesn't really care what you do with the declaration tree, so you could totally do something like this.

To give an example, I use Biohazrd's declaration tree to generate a library exports file for PhysX to use the linker to validate everything Biohazrd needs is properly exported by the DLL. (Although the need for this functionality has since been replaced by Kaisa and LinkImportsTransformation, which act like a linker of sorts.)

As a total .NET shill I also wanted to address these two comments:

C# I don't know how easy to access on Linux, Mac or alternative Windows toolchains such as MinGW.

in my head it's a tradeoff between C# which "just works" on Windows but can be a bit of a PITA to set up on OSX/Linux

This is no longer true with modern .NET which treats MacOS and Linux as first class citizens. (This started with .NET Core in 2016, which was a soft reboot of the .NET ecosystem and is very mature at this point.) You don't even need to require having the .NET runtime installed if you don't want to.

One thing that I didn't really get to address above is that Biohazrd does support things like generating language-friendly wrappers around the unsafe library layer (IE: Using System.String instead of char*), but this isn't well represented in InfectedImGui or InfectedPhysX since I'm still figuring out how I want to generalize that stuff.

If you want to see an example of some of that in action, here's my private Direct3D 12 sample adaptation. It's still unsafe code, but the main thing to note here is the use of C# generics to simplify the IID_PPV_ARGS pattern meant to be used in C++. (For contrast, here's the C++ version of that sample.)

It's actually a bit unfortunate I haven't published InfectedDirectX or InfectedWin32 since they're definitely the nicest Biohazrd-generated libraries I currently have. (Here's a video of a glTF renderer I made using them.)

So in short: Biohazrd was basically created to make this sort of thing possible, and is already very capable of handling Dear ImGui.

If you all are interested, I can try to find some time to get InfectedImGui back up to date and do some of cleanup I've been meaning to do. I can also look into writing a preliminary (likely ImGui-specific) C output emit stage. (I'd offer to do Rust since that seems like a focus here, but I'm not experienced enough in Rust yet to be confident I could get a proof of concept for it going quickly and without somewhat significant effort on my part.)

PathogenDavid commented 3 years ago

Also I feel like I at least indirectly addressed everything that came up in the other comments here, but don't hesitate to ask if I missed something or anyone has any questions.

One more thing I forgot to mention: Biohazrd generators themselves are also currently Windows x64 only due to the fact that I only build the native Clang dependencies for Windows x64. Obviously something that can be fixed, just hasn't been a priority. So apologies if any of you were hoping to run a generator on Linux/MacOS. (As seen above though, I do commit the generated code to allow people to review it without downloading or running anything.)

ShironekoBen commented 3 years ago

Thanks for the huge amount of info, @PathogenDavid! It's going to take a while to digest all that...

I've not come across Biohazrd before, and there's definitely a lot of potential there. I'm a big C# fan myself, and anything that makes interop bindings easier is much appreciated! (on the subject of C# being a PITA on non-Windows OSes, I was thinking more of the dev environment at the time - as you say, .net core native packages are a probably a viable option if people just need to run the binary)

A few initial thoughts:

1) From a non-C-binding perspective the broad outline sounds very sensible - it feels like a much more fleshed-out version of the clang ideas we'd been knocking around above. 2) For C I'm not sure how I feel... part of me thinks that if you can generate high-quality C# bindings then it shouldn't be hard to do the same for C, whilst another part thinks that inevitably the process is going to throw away a lot of C/C++-specific information that currently exists (like macros) and so a text-transformation-based C conversion would produce a more "natural" library. 3) Skipping the C wrapper and calling straight into C++ is... bold! Conceptually I like it but the bugs/compatibility issues scare me (I spent way too long digging around inside C#/C calling convention hilarity from the opposite side when writing Catnip... ^-^;).

I'm sure I'll think of more things whilst I ponder this, but that's what's popped into my head thus far.

PathogenDavid commented 3 years ago

No problem! And no worries, as soon as I finished typing all that up I realized it would probably take a while for everyone to churn through it.

on the subject of C# being a PITA on non-Windows OSes, I was thinking more of the dev environment at the time

Ah, that makes sense. I tend to be a "develop on Windows test on Linux" sort of guy, but from what I've heard Visual Studio Code and Visual Studio Mac are both very mature for .NET development at this point. If I remember right, the Roslyn (C# compiler) devs primarily work in VSCode these days rather than Visual Studio.

inevitably the process is going to throw away a lot of C/C++-specific information that currently exists

Biohazrd in general tends to try and abstract away Clang details without preventing your access to them.

A good example is ImVector<T>. Biohazrd can't directly handle C++ templates very well* since they don't translate cleanly to other languages, but Biohazrd still retains the necessary information needed for someone to manually reason about them. In InfectedImGui I made the decision to manually implement ImVector<T> in C# and then created a special transformation which replaces ImVector<T> template references with this type.

(*Recently I've been experimenting with eagerly instantiating all templates referenced in C++. It'll never scale for STL-heavy C++ libraries but it works well for things like OpenCV's generic matrix type.)

(like macros)

Biohazrd can actually enumerate and even evaluate macros! (Example here.)

Clang also retains the syntax tree of macros at reasonable fidelity. It's not exposed by libclang, but it wouldn't be the first time I've had to manually expose things libclang didn't.

(For the sake of clarity, Clang has two API surfaces: The C-based libclang API and the actual C++ APIs actually used by Clang internally. I primarily use the former via ClangSharp with some (ironically enough) manually written interop to access what I need from the rest. It's a distant-term goal that one day Biohazrd will generate bindings for the Clang C++ API.)

Although with how few macros Dear ImGui has, I'd be tempted to only mechanically translate the constant-like macros and manually translate the function-like macros. (And add a sanity check to the generator which complains when new ones appear so that someone knows to manually add them.)

To me the most important macro in all of Dear ImGui is IMGUI_CHECKVERSION, which I think could be machine translated with a dash of human generalization.

text-transformation-based C conversion would produce a more "natural" library.

This is definitely true if you wanted cimgui.h to be nearly identical to imgui.h. I would worry about the fragility of such an approach though. (Previous experience says that thinking you can parse C/C++ always ends in heartbreak even when you think you're constrained to a specific style of C/C++. Although I did try to use Regex, so maybe I was doomed from the start.)

Clang is actually pretty good at associating comments with the appropriate declaration from what I've seen, so I feel like you could still get something fairly close if your main concern is that people should be able to use their preferred binding's code as documentation. (Personally though even when I used ImGui.NET I always used imgui.h as my documentation.)

From personal experience, if you want to achieve "As a developer I can use any version of Dear ImGui from my chosen language by automatically regenerating bindings" you can't get away with text transformation of a C++ header file. If you defined ImGui's API using an IDL then you can probably get away with that. But then you risk impacting the experience of C++ developers, which I would presume are the majority of Dear ImGui users.

(In it's simplest form, cimgui.h could just be imgui.h with globs of preprocessor goop making it work in both C++ and C. Having worked in headers like that...not my personal preference but it certainly works. -- Sometimes at the expense of Intellisense which tends to choke on cute preprocessor abuse.)

Skipping the C wrapper and calling straight into C++ is... bold! Conceptually I like it but the bugs/compatibility issues scare me

Yup, it's definitely unconventional. I think it's something that becomes less of a concern as time goes on, but as you've noted it's utter hell when you run into a new situation that wasn't handled appropriately.

Biohazrd follows a philosophy of "Clang is always right and we are always wrong", so it does its best to defer to Clang for ABI concerns. (For example, Biohazrd asks Clang whether a type can be passed to via register or not. IE: Turns out C++ types with a copy constructor are never passed via register.) There are still ABI concerns in Biohazrd (like the order of the return buffer parameter and the this pointer), but I'd like to see even those eliminated eventually. (As you might imagine, Clang is a bit of a mess internally so I haven't quite found where the core of the ABI stuff is handled and it's certainly not surfaced on the libclang API.)

That being said, most of the C++ ABI quirks tend to come from methods, of which Dear ImGui has very few. The main C++ ABI concern for Dear ImGui is the name mangling, which is something Clang figures out for us.

But yeah, it's still spooky. That's why I brought up the various plans/ideas I have to avoid it or make it more sane.

Catnip... ^-^;

Ooh, that's a cool project! I've actually had in the back of my head that it'd be fun to make a baby CLR or a toy compiler targeting the GBA or maybe the NDS. Glad I'm not the only one.

rokups commented 3 years ago

Another solution that would be more viable short-term would be using SWIG. AFAIK it can generate c interop layer alone in addition to complete bindings for a number of languages. I even have made a poor man's wrapper for myself. I did not care to make it complete because application setup is done by C++ world, so bunch of lower level stuff is simply ignored to get it going faster. But it could be ironed out to reasonable level of completion fast enough.

ShironekoBen commented 3 years ago

Still slowly working my brain through all the implications of this stuff... I've been wondering if we have any data on how people use the C headers for C applications (as opposed to other languages) - are they building projects using a modern C++ compiler where they have the Imgui source compiling as part of their project, and are just "choosing" to write their code in C, or are they compiling Imgui into a static/dynamic library and linking against that? And of the former how many people are making use of the imconfig.h stuff (and in particular complex stuff like overriding ImTexture2D)?

That feels like the big design decision here - can C be treated like any other binding with a fixed configuration, or does it need special treatment to allow use of (dynamic) custom configuration features?

(rationale: by definition non-C languages must be linking against a compiled binary of some form, and thus are dealing with a fixed set of #ifdefs and config overrides, which means in turn we can use a preprocessor and bake that information into the metadata/bindings without any problems - technically I guess there are use-cases where someone might have different configurations for debug/release/etc with separate libs, but chances are they can have separate bindings as well without too much grief. That in turn means that the Biohazrd approach of using Clang or similar to generate metadata is valid, which is infinitely simpler and more robust than trying to hand-parse C++ in any form!)

ShironekoBen commented 3 years ago

Just a quick update on this for anyone interested - I've been bashing away at the parsing problem (aiming to solve the "C bindings for C users" side of the problem, mainly) and after a few attempts now have something that can parse the whole of imgui.h and turn it into a reasonably sensible hierarchical object model, which looks something like this:

Ifndef: IMGUI_OVERRIDE_DRAWVERT_STRUCT_LAYOUT
    If-block:
        STRUCT: ImDrawVert
            Field: Type=Type: ImVec2 Names= pos
            Field: Type=Type: ImVec2 Names= uv
            Field: Type=Type: ImU32 Names= col
            Ifdef: IMGUI_ENABLE_ADVANCED_SHADER
                If-block:
                    Field: Type=Type: ImU32 Names= outlineCol
                    Field: Type=Type: ImU32 Names= shadowCol
                    Field: Type=Type: ImVec2 Names= shapePoint1
                    Field: Type=Type: ImVec4 Names= shapePoints23
                    Comment: // XY = point 2, ZW = point 3
                    Field: Type=Type: ImVec4 Names= sdfParams
                    Comment: // X = SDF offset, Y = Outline width, Z = Texture/shape mode (0=texture, 1=shape), W = SDF enable/disable (1=enable)
                    Field: Type=Type: ImVec4 Names= shadowParams
                    Comment: // XY = Shadow offset, Z = 1/Shadow width (0 to disable shadow), W = Shadow SDF offset
            Blank line

STRUCT: ImFont
                Comment: // Members: Hot ~20/24 bytes (for CalcTextSize)
                Field: Type=Type: ImVector < float > Names= IndexAdvanceX
                Comment: // 12-16 // out //            // Sparse. Glyphs->AdvanceX in a directly indexable way (cache-friendly for CalcTextSize functions which only this this info, and are often bottleneck in large UI).
                Field: Type=Type: float Names= FallbackAdvanceX
                Comment: // 4     // out // = FallbackGlyph->AdvanceX
                Field: Type=Type: float Names= FontSize
                Comment: // 4     // in  //            // Height of characters/line, set during loading (don't change after loading)
                Comment: // Members: Hot ~28/40 bytes (for CalcTextSize + render loop)
                Field: Type=Type: ImVector < ImWchar > Names= IndexLookup
                Comment: // 12-16 // out //            // Sparse. Index glyphs by Unicode code-point.
                Field: Type=Type: ImVector < ImFontGlyph > Names= Glyphs
                Comment: // 12-16 // out //            // All glyphs.
                Field: Type=Type: const ImFontGlyph * Names= FallbackGlyph
                Comment: // 4-8   // out // = FindGlyph(FontFallbackChar)
                Comment: // Members: Cold ~32/40 bytes
                Field: Type=Type: ImFontAtlas * Names= ContainerAtlas
                Comment: // 4-8   // out //            // What we has been loaded into
                Field: Type=Type: const ImFontConfig * Names= ConfigData
                Comment: // 4-8   // in  //            // Pointer within ContainerAtlas->ConfigData
                Field: Type=Type: short Names= ConfigDataCount
                Comment: // 2     // in  // ~ 1        // Number of ImFontConfig involved in creating this font. Bigger than 1 when merging multiple font sources into one ImFont.
                Field: Type=Type: ImWchar Names= FallbackChar
                Comment: // 2     // in  // = '?'      // Replacement character if a glyph isn't found. Only set via SetFallbackChar()
                Field: Type=Type: ImWchar Names= EllipsisChar
                Comment: // 2     // out // = -1       // Character used for ellipsis rendering.
                Field: Type=Type: bool Names= DirtyLookupTables
                Comment: // 1     // out //
                Field: Type=Type: float Names= Scale
                Comment: // 4     // in  // = 1.f      // Base font scale, multiplied by the per-window font scale which you can adjust with SetWindowFontScale()
                Field: Type=Type: float Names= Ascent Descent
                Comment: // 4+4   // out //            // Ascent: distance from top to bottom of e.g. 'A' [0..FontSize]
                Field: Type=Type: int Names= MetricsTotalSurface
                Comment: // 4     // out //            // Total surface in pixels to get an idea of the font rasterization/texture cost (not exact, we approximate the cost of padding between glyphs)
                Field: Type=Type: ImU8 Names= Used4kPagesMap[( IM_UNICODE_CODEPOINT_MAX +1 ) / 4096 / 8]
                Comment: // 2 bytes if ImWchar=ImWchar16, 34 bytes if ImWchar==ImWchar32. Store 1-bit for each block of 4K codepoints that has one active glyph. This is mainly used to facilitate iterations across all used codepoints.
                Comment: // Methods
                Function: Return type=None Name=ImFont Body=None IMGUI_API
                Function: Return type=None Name=~ImFont Body=None IMGUI_API
                Function: Return type=Type: const ImFontGlyph * Name=FindGlyph Arguments= [Arg: Type=Type: ImWchar Name=c] Body=None Const IMGUI_API
                Function: Return type=Type: const ImFontGlyph * Name=FindGlyphNoFallback Arguments= [Arg: Type=Type: ImWchar Name=c] Body=None Const IMGUI_API
                Function: Return type=Type: float Name=GetCharAdvance Arguments= [Arg: Type=Type: ImWchar Name=c] Body=CodeBlock: Length=22 Const
                Function: Return type=Type: bool Name=IsLoaded Body=CodeBlock: Length=6 Const
                Function: Return type=Type: const char * Name=GetDebugName Body=CodeBlock: Length=10 Const
                Comment: // 'max_width' stops rendering after a certain width (could be turned into a 2d size). FLT_MAX to disable.
                Comment: // 'wrap_width' enable automatic word-wrapping across multiple lines to fit into given width. 0.0f to disable.
                Function: Return type=Type: ImVec2 Name=CalcTextSizeA Arguments= [Arg: Type=Type: float Name=size] [Arg: Type=Type: float Name=max_width] [Arg: Type=Type: float Name=wrap_width] [Arg: Type=Type: const char * Name=text_begin] [Arg: Type=Type: const char * Name=text_end Default=NULL] [Arg: Type=Type: const char * * Name=remaining Default=NULL] Body=None Const IMGUI_API
                Comment: // utf8
                Function: Return type=Type: const char * Name=CalcWordWrapPositionA Arguments= [Arg: Type=Type: float Name=scale] [Arg: Type=Type: const char * Name=text] [Arg: Type=Type: const char * Name=text_end] [Arg: Type=Type: float Name=wrap_width] Body=None Const IMGUI_API
                Function: Return type=Type: void Name=RenderChar Arguments= [Arg: Type=Type: ImDrawList * Name=draw_list] [Arg: Type=Type: float Name=size] [Arg: Type=Type: ImVec2 Name=pos] [Arg: Type=Type: ImU32 Name=col] [Arg: Type=Type: ImWchar Name=c] Body=None Const IMGUI_API
                Function: Return type=Type: void Name=RenderText Arguments= [Arg: Type=Type: ImDrawList * Name=draw_list] [Arg: Type=Type: float Name=size] [Arg: Type=Type: ImVec2 Name=pos] [Arg: Type=Type: ImU32 Name=col] [Arg: Type=Type: const ImVec4 & Name=clip_rect] [Arg: Type=Type: const char * Name=text_begin] [Arg: Type=Type: const char * Name=text_end] [Arg: Type=Type: float Name=wrap_width Default=0.0f] [Arg: Type=Type: bool Name=cpu_fine_clip Default=false] Body=None Const IMGUI_API
                Comment: // [Internal] Don't use!
                Function: Return type=Type: void Name=BuildLookupTable Body=None IMGUI_API
                Function: Return type=Type: void Name=ClearOutputData Body=None IMGUI_API
                Function: Return type=Type: void Name=GrowIndex Arguments= [Arg: Type=Type: int Name=new_size] Body=None IMGUI_API
                Function: Return type=Type: void Name=AddGlyph Arguments= [Arg: Type=Type: const ImFontConfig * Name=src_cfg] [Arg: Type=Type: ImWchar Name=c] [Arg: Type=Type: float Name=x0] [Arg: Type=Type: float Name=y0] [Arg: Type=Type: float Name=x1] [Arg: Type=Type: float Name=y1] [Arg: Type=Type: float Name=u0] [Arg: Type=Type: float Name=v0] [Arg: Type=Type: float Name=u1] [Arg: Type=Type: float Name=v1] [Arg: Type=Type: float Name=advance_x] Body=None IMGUI_API
                Function: Return type=Type: void Name=AddRemapChar Arguments= [Arg: Type=Type: ImWchar Name=dst] [Arg: Type=Type: ImWchar Name=src] [Arg: Type=Type: bool Name=overwrite_dst Default=true] Body=None IMGUI_API
                Comment: // Makes 'dst' character/glyph points to 'src' character/glyph. Currently needs to be called AFTER fonts have been built.
                Function: Return type=Type: void Name=SetGlyphVisible Arguments= [Arg: Type=Type: ImWchar Name=c] [Arg: Type=Type: bool Name=visible] Body=None IMGUI_API
                Function: Return type=Type: void Name=SetFallbackChar Arguments= [Arg: Type=Type: ImWchar Name=c] Body=None IMGUI_API
                Function: Return type=Type: bool Name=IsGlyphRangeUnused Arguments= [Arg: Type=Type: unsigned int Name=c_begin] [Arg: Type=Type: unsigned int Name=c_last] Body=None IMGUI_API

...the next step is probably to reverse the process (which should hopefully be easier!) and turn the object tree back into a header file, that in theory should be functionally - and mostly aesthetically - identical to the original imgui.h. Once that's working I can start adding manipulations that walk the tree and mess with the contents to remove the C++-ness and turn it into vanilla C.

ocornut commented 3 years ago

Wow, great work! Thanks for the update.

(I'll been half considering making 1.83 the release with ImStrv (the lightweight stringview type), kind of to forcefully put the problem in the public spot but I think if we have the possibility of a sane solution ahead I would hold on for now!)

ShironekoBen commented 3 years ago

Yeah, I think we're getting close - after another hacking session I can now convert the 1.83 imgui.h into DOM notation and then back to a C++ header file which compiles cleanly (and runs!), so in theory it's "just" the actual flattening out all the C++-isms and turning them into C equivalents left now.

For the curious, here's what the deconstructed-then-reconstructed source looks like - as you can see it preserves comments/etc, and now has a notion of treating comments as "attached" to a particular declaration so they follow it... although we do lose the nice aligned spacing in places, sadly (I have an idea how we could at least get a semblance of that back, though):

// Font runtime data and rendering
// ImFontAtlas automatically loads a default embedded font for you when you call GetTexDataAsAlpha8() or GetTexDataAsRGBA32().
struct ImFont
{
    // Members: Hot ~20/24 bytes (for CalcTextSize)
    ImVector<float> IndexAdvanceX; // 12-16 // out //            // Sparse. Glyphs->AdvanceX in a directly indexable way (cache-friendly for CalcTextSize functions which only this this info, and are often bottleneck in large UI).
    float FallbackAdvanceX; // 4     // out // = FallbackGlyph->AdvanceX
    float FontSize; // 4     // in  //            // Height of characters/line, set during loading (don't change after loading)

    // Members: Hot ~28/40 bytes (for CalcTextSize + render loop)
    ImVector<ImWchar> IndexLookup; // 12-16 // out //            // Sparse. Index glyphs by Unicode code-point.
    ImVector<ImFontGlyph> Glyphs; // 12-16 // out //            // All glyphs.
    const ImFontGlyph* FallbackGlyph; // 4-8   // out // = FindGlyph(FontFallbackChar)

    // Members: Cold ~32/40 bytes
    ImFontAtlas* ContainerAtlas; // 4-8   // out //            // What we has been loaded into
    const ImFontConfig* ConfigData; // 4-8   // in  //            // Pointer within ContainerAtlas->ConfigData
    short ConfigDataCount; // 2     // in  // ~ 1        // Number of ImFontConfig involved in creating this font. Bigger than 1 when merging multiple font sources into one ImFont.
    ImWchar FallbackChar; // 2     // in  // = '?'      // Replacement character if a glyph isn't found. Only set via SetFallbackChar()
    ImWchar EllipsisChar; // 2     // out // = -1       // Character used for ellipsis rendering.
    bool DirtyLookupTables; // 1     // out //
    float Scale; // 4     // in  // = 1.f      // Base font scale, multiplied by the per-window font scale which you can adjust with SetWindowFontScale()
    float Ascent, Descent; // 4+4   // out //            // Ascent: distance from top to bottom of e.g. 'A' [0..FontSize]
    int MetricsTotalSurface; // 4     // out //            // Total surface in pixels to get an idea of the font rasterization/texture cost (not exact, we approximate the cost of padding between glyphs)
    ImU8 Used4kPagesMap[(IM_UNICODE_CODEPOINT_MAX +1)/4096/8]; // 2 bytes if ImWchar=ImWchar16, 34 bytes if ImWchar==ImWchar32. Store 1-bit for each block of 4K codepoints that has one active glyph. This is mainly used to facilitate iterations across all used codepoints.

    // Methods
    IMGUI_API ImFont();
    IMGUI_API ~ImFont();
    IMGUI_API const ImFontGlyph* FindGlyph(ImWchar c) const;
    IMGUI_API const ImFontGlyph* FindGlyphNoFallback(ImWchar c) const;
    float GetCharAdvance(ImWchar c) const
        {  return ((int)c < IndexAdvanceX.Size) ? IndexAdvanceX[(int)c] : FallbackAdvanceX;  }
    bool IsLoaded() const
        {  return ContainerAtlas != NULL;  }
    const char* GetDebugName() const
        {  return ConfigData ? ConfigData->Name : "<unknown>";  }

    // 'max_width' stops rendering after a certain width (could be turned into a 2d size). FLT_MAX to disable.
    // 'wrap_width' enable automatic word-wrapping across multiple lines to fit into given width. 0.0f to disable.
    IMGUI_API ImVec2 CalcTextSizeA(float size, float max_width, float wrap_width, const char* text_begin, const char* text_end = NULL, const char** remaining = NULL) const; // utf8
    IMGUI_API const char* CalcWordWrapPositionA(float scale, const char* text, const char* text_end, float wrap_width) const;
    IMGUI_API void RenderChar(ImDrawList* draw_list, float size, ImVec2 pos, ImU32 col, ImWchar c) const;
    IMGUI_API void RenderText(ImDrawList* draw_list, float size, ImVec2 pos, ImU32 col, const ImVec4& clip_rect, const char* text_begin, const char* text_end, float wrap_width = 0.0f, bool cpu_fine_clip = false) const;

    // [Internal] Don't use!
    IMGUI_API void BuildLookupTable();
    IMGUI_API void ClearOutputData();
    IMGUI_API void GrowIndex(int new_size);
    IMGUI_API void AddGlyph(const ImFontConfig* src_cfg, ImWchar c, float x0, float y0, float x1, float y1, float u0, float v0, float u1, float v1, float advance_x);
    IMGUI_API void AddRemapChar(ImWchar dst, ImWchar src, bool overwrite_dst = true); // Makes 'dst' character/glyph points to 'src' character/glyph. Currently needs to be called AFTER fonts have been built.
    IMGUI_API void SetGlyphVisible(ImWchar c, bool visible);
    IMGUI_API void SetFallbackChar(ImWchar c);
    IMGUI_API bool IsGlyphRangeUnused(unsigned int c_begin, unsigned int c_last);
};

ShironekoBen commented 3 years ago

OK, I've got something that I think could reasonably be described as a working prototype of the C conversion system.

It successfully converts imgui.h into something which compiles cleanly as a vanilla C header (on MSVC's default settings, at least), can be used successfully in one of the existing cimgui tests, and does a reasonably decent job of preserving the formatting/comments/ifdefs/etc.

Architecturally it's pretty much as I talked about before - it turns the header into a DOM which it then manipulates with a bunch of "modifiers" that each do a specific transformation (like removing namespaces or dragging functions out of structs) before the DOM gets written out again and stubs for the binding generated. It's turned out relatively clean, IMHO - there are a few relatively minor code-specific hacks, but 99% of it is reasonably input-agnostic. The worst bit is probably template expansion - that has a few significant limitations (only one template parameter allowed, for example) right now, and one awkward hack to deal with how "const T*" gets expanded when T is itself a pointer.

If you want to see an overview of the process it uses and the tweaks applied, check out the "Apply modifiers" block in main.py.

Known issues at the moment:

1) Things are broadly imguic-compatible, but disambiguated names for overloaded functions are both not-very-similar to imguic and also pretty unpleasant to look at generally in a lot of cases. I'm not sure if we should be aiming for imguic compatibility here (in which case I need to decypher its disambiguation scheme), or if doing something "vaguely-nice-looking" and then adding some hand-crafted overrides for things that need custom names is the way to go.

2) I had to do one very small hack to imgui.h - moving the definition of ImDrawIdx up above ImVector<>, as otherwise when things try to instantiate ImVector the generated code references it before it is declared (I spent a while trying to work around that by moving the generated template instantiations down, but that ends up being a bit of a catch-22 situation as then they can land after code that references them, and you can't forward-declare it because it's used as a value member in a struct).

3) All of the ImVector<> functions are exposed to C, except contains(), because a lot of the things that are used with ImVector<> don't actually implement equality operators and thus contains() won't compile for those types, and unlike C++ we can't rely on lazy template expansion to make that a non-issue until someone actually uses contains(). I spent a little while adding operators to things but there are several classes in there where it's dubious that is really a good idea (ImDrawChannel, for example) so I ditched that in the end. If there are use-cases for contains() on things that do have equality operators we can add some sort of more specific exclusion filter, but for now the code just removes contains() entirely.

4) I had to add an implementation of LogTextV() to the stubs file, since imgui doesn't currently provide one (other varargs functions have a va_list variant, so I'm guessing it got overlooked?) and it's necessary to be able to forward varargs from the stub function.

5) When compiled there are some warnings from the C++ stubs about non-C return types - I haven't looked too deeply into this to see if there is a genuine problem here, but my initial suspicion is that it's a false positive caused by the fact that the C++ stub declarations use the C++ versions of the classes... I'm not entirely happy about that as a design decision, TBH, but cimgui seems to get away with it OK and doing it "properly" is rather messy as you need namespace trickery and a lot of casting to allow the "C++ imgui" and "C imgui" worlds to coexist in one file.

6) I think it's a little overenthuastic sometimes with adding "struct" to type references (not in a manner that breaks things, just in a "technically that's just unnecessary clutter" kinda way).

7) Conversely, it doesn't always add IMGUI_API to things that probably need them (although conversely I'm not sure if IMGUI_API should be used as-is like that in C anyway...?)

8) Everything aside from the most core functionality is aggressively untested.

Since I'm not really sure where it would be best for it to live (and right now if you want to test it setting up the test code in a clean way is a bit annoying), for the moment I've put a ZIP file here:

https://www.dropbox.com/s/et9ssu8snnuh16a/DearImguiC-Prototype.7z?dl=0

If you just want to jump straight to "what does the output look like", check out TestCode/sdl2-cimgui-demo/generated/cimgui.h and .cpp (I've included pregenerated versions in the ZIP).

To get it going, you should just need to:

1) Install Python v3.8 2) Install "Python Lex & Yacc" v3.11 ("pip install ply" should do it, or see https://pypi.org/project/ply/) 3) Run "python main.py"

This will generate TestCode/sdl2-cimgui-demo/generated/cimgui.h and .cpp.

You can then open the Visual Studio project in TestCode/sdl2-cimgui-demo/ and run it (x64 Debug) and it should all work.

What does anyone think? I'd be especially interested too if anyone knows of any (actually C) cimgui test code that actually exercises most/all of the functionality of the library - as it stands all I've found is stuff that opens a couple of demo windows, which only really touches about 2% of the API...

floooh commented 3 years ago

Hello all, very interesting thread! I can answer some C related questions (as author of the sokol headers linked by Omar above, and also having dabbled with language bindings, which has been a bit of my focus in the last 6 months or so).

I'll start with some feedback to @ShironekoBen's C-related questions:

I'm curious to know if we have any idea how many people are using this as an actual C library vs as a means of binding to higher-level languages.

At least among the users of the sokol headers (which can be used both from C and C++), using Dear ImGui from C via cimgui seems to be preferred over C++. No hard numbers to back this up though, just my impression from the feedback I'm getting.

For Dear ImGui integration, sokol library users have been specifically asking for cimgui integration, so that they can stay in C when writing UI code instead of having to split the project into C and C++ source files. So at least from my point of view, being able to use Dear ImGui directly from C is a very important feature (e.g. the C API should be reasonably "friendly" - in that regard, the cimgui API is definitely "good enough" to be used directly, any new C-API should be at least as friendly as cimgui IMHO.

Skipping the C wrapper and calling straight into C++ is... bold!

I tend to agree, I think a C-API shim is required anyway because of ABI issues, at least a C-API makes everything dramatically easier (in my experience at least). AFAIK some languages (python for instance?) require the native code to be compiled into DLLs, and exposing a C++ API from a DLL is almost never a good idea IMHO.

I've been wondering if we have any data on how people use the C headers for C applications ...

As an example, here's an "as simple as possible" cross-platform starter-kit using cimgui and the sokol-headers. It's quite simple really:

https://github.com/floooh/cimgui-sokol-starterkit

The top-level "demo code" looks like this (doesn't show much of the actual ImGui calls, but as I said, coding directly against cimgui is really quite nice:

https://github.com/floooh/cimgui-sokol-starterkit/blob/main/demo.c

A snapshot of cimgui and Dear ImGui is embedded in the project:

https://github.com/floooh/cimgui-sokol-starterkit/tree/main/cimgui

...and compiled into a static library:

https://github.com/floooh/cimgui-sokol-starterkit/blob/c547d134136da1bd33cc0243e6f642f3aab07df8/CMakeLists.txt#L16-L25

The utility header which implements the Dear ImGui rendering backend on top of the sokol headers is actually a bit special, this has a "bilingual" implementation. If the implementation is included into a C++ source file, it will directly call the Dear ImGui C++ API, and if the implementation is included into a C file, it will instead call the cimgui C-API (this is just an implementation detail, but IMHO is a good reason why the Dear ImGui C++ API and C-API should remain as similar as possible):

https://github.com/floooh/cimgui-sokol-starterkit/blob/c547d134136da1bd33cc0243e6f642f3aab07df8/sokol/sokol_imgui.h#L1899-L1907

floooh commented 3 years ago

...and a short braindump about language bindings (since I've been working on a similar project for the sokol headers in the last few months, with the difference that I'm starting with a C-API and want to expose this to other languages - currently Zig and Nim):

I've used the same way mentioned above: Use clang's "-ast-dump=json" to generate a (very verbose) JSON file, process this with Python into a much simpler intermediate JSON file which contains just the minimal information to describe the public API, but nothing more (so basically: enums, structs and function signatures). This intermediate JSON file is then the common base for per-language bindings generators (also written in Python). The goal is eventally to run those bindings generators automatically via CI (for instance in Github Actions).

AFAIK using libclang directly to parse the original headers instead of "-ast-dump=json" is more robust and may provide more information. But so far it's been "good enough" for my use case.

Some thoughts on this:

Parsing clang's AST information can become complex, but it gets a lot easier if you also control the base API, for instance:

I replaced constant expressions like (1<<3) in my enum definitions with simple integer literals, so I don't need to parse expressions in my AST parser
Same for anonymous nested structs and unions, those are tedious to parse, so I simply removed unions and anonymous nested structs from my C-API
I sometimes used new typedefs in the C-API just to give the bindings generator additional hints. Surprisingly, this often also leads to a "more correct" C-API. For instance, I replaced "sloppy" pointer/size arguments and struct members with a new "range struct". This makes it easier to generate bindings for high-level languages which have pointer/size slices or views, but it also makes the C programming model simpler and more correct.
However sometimes there's a conflict between an "idiomatic C API" and "idiomatic high-level language APIs". Finding the right compromise there is the tricky part. This is especially the case where higher-level languages have stronger typing than C. In those cases I sometimes added new functions to the C-API for the only reason that it makes the API easier to use from more strongly typed languages (for instance: "float argument" versions of functions which usually take integer arguments, those help to avoid excessive float-int conversions in strongly typed languages: https://github.com/floooh/sokol/blob/31acf61cb0e1000e66ce55e9b60bf0b67fd9cac8/sokol_gfx.h#L2296-L2297)

In general I'm starting to get used to the idea that the original C (or C++) implementation and public API may contain language-bindings-specific tweaks which are "switched on" with a preprocessor define. For instance in my Zig bindings, I had to work around an ABI-related compiler bug involving small structs. So I simply added a define "SOKOL_ZIG_BINDINGS" which is set when the Zig language bindings are generated and used which pad public API structs to go above 16 bytes:

https://github.com/floooh/sokol/blob/31acf61cb0e1000e66ce55e9b60bf0b67fd9cac8/sokol_gfx.h#L841-L843

Initially this was just a quick'n'dirty hack, but I'm starting to think that this could also be a useful tool to add language-specific tweaks to the API and implementation in order to "appease" specific target languages (e.g. null-terminated string versus pointer/size pairs versus pointer/pointer pairs).

Many more details in this blog post: https://floooh.github.io/2020/08/23/sokol-bindgen.html, but that probably goes a bit too far for this discussion thread :)

Cheers!

PS: for completeness sake some links:

the python scripts for processing the "clang -ast-dump=json" output and bindings generator scripts: https://github.com/floooh/sokol/tree/master/bindgen
the resulting Zig bindings: https://github.com/floooh/sokol-zig/
a similar experiment to create C bindings for macOS Objective-C headers (the AST parser is a lot more complex because I can't tweak the macOS system headers to simplify the job): https://github.com/floooh/objc-ast-experiments

floooh commented 3 years ago

@ShironekoBen I'll tinker a bit with your prototype next and try provide some feedback. But it'll take a day or two.

This is exciting :)

floooh commented 3 years ago

@ShironekoBen: some preliminary feedback from looking at the generated header file:

Functions which take no argument should be explicitely declared with a (void) argument list (e.g. void igSeparator(void) instead of void igSeparator(), otherwise the C compiler won't warn if the function is accidentally called with arguments (e.g. void igSeparator(); can be called as igSeparator(1, 2, 3);: https://www.godbolt.org/z/EnYEGvccG)
Since all structs are defined as typedefs typedef MyStruct { ... } MyStruct;, IMHO it would be better if the struct is dropped in places where the structure is used (e.g. ImFont* FontDefault; instead of struct ImFont* FontDefault;).
The "C++ constructor functions" (like ImVec4_ImVec4()) are currently quite dangerous and inefficient because they return a heap-allocated object, which invites memory leaking. IMHO those should simply return an object by value:

// currently it looks like this:
IMGUI_CAPI ImVec2* ImVec2_ImVec2FloatFloat(float _x, float _y)
{
    return new ImVec2(_x, _y);
}

// but it should look like this:
IMGUI_CAPI ImVec2 ImVec2_ImVec2FloatFloat(float _x, float _y)
{
    // cimgui.cpp is C++, so this should work fine:
    return { _x, _y };
}

The question is though if those constructors are actually needed, because for instance in C99 I would simply do:

ImVec2 vec = { .x = 1.0f, .y = 2.0f };
// or:
ImVec2 vec = { 1.0f, 2.0f };

I wonder if all those ImVector C++ interface functions are actually needed (outside of the actual ImGui implementation code) or if it's better to detect the ImVector specializations as special cases and only expose a very reduced set of functions (for instance, only getting a C pointer to the first element, and the number of elements). I think the original cimgui has the right approach there, but I'm not sure how much of this is handcrafted.

As for the question:

(although conversely I'm not sure if IMGUI_API should be used as-is like that in C anyway...?)

I think it's important to let library users override IMGUI_API with (for instance) static or __declspec(dllexport/dllimport).

In conclusion, I think the most important feedback is that IMHO the API should never return pointers to heap-allocated objects (e.g. all the new calls in the wrapper code), because this invites memory management problems. New "objects" should either be returned by value, or ideally most constructor functions (e.g. for the ImVector<> stuff) shouldn't be exposed in the C API.

Next I'll try to actually write some code, which hopefully yields some more specific feedback :)

floooh commented 3 years ago

PS: I just noticed that the original cimgui.h also exposes "constructor functions" which return heap-allocated objects. I wonder if there's a specific reason for this, because it strikes me as a not very good idea... (I also never used those, that's why I wonder if there's a non-obvious use case where those constructor functions are needed).

ocornut commented 3 years ago

Amazing work Ben, it's looking really great already.

cimgui.h is already almost on par with imgui.h
main.py itself seems like exactly the right level of flexibility sanity and I would have hoped for that kind of code.

GitHub now allows all users to create private repository so you may want to have it on a private repo? I can also create one (under the dearimgui/ umbrella) if you want. This way I could potentially submitted polish patches, of you think it is too early we can checkin source control later.

Using latest

Consider using latest (currently using 1.69 vs 1.83 WIP) as I imagine more recent version could lead to differences. I tried to overwrite the files with latest and the generator worked with the two changes you mentioned (moved ImDrawIdx, and had to manually remove ImNewWrapper from cimgui.h + added imgui_tables.cpp to the project) along with:

modifiers.mod_remove_functions.apply(dom_root, ["ImVector::find"])
modifiers.mod_remove_functions.apply(dom_root, ["ImVector::find_erase"])
modifiers.mod_remove_functions.apply(dom_root, ["ImVector::find_erase_unsorted"])

Also to remove the == operator use.

Heap constructors

Agree they should be removed, as well as probably many of the ImVector<> stuff (more on that later)

Function names & overloaded functions

Things are broadly imguic-compatible, but disambiguated names for overloaded functions are both not-very-similar to imguic and also pretty unpleasant to look at generally in a lot of cases.

One of the issue I had with cimgui is its names are odd in the first place, even the ig prefix seems odd to me. Going to give some loose suggestions:

(1.A) I think the general rule of thumb should be:

bool ImGui_Button(const char* str_id, ...);

(1.B) For string view support we could output additional API (longer name) designed for language bindings: e.g.

  bool ImGui_Button_Strv(ImStrv str_id, ...);

In reality this would also largely depends on how the high-level language bindings are generated. The ideal aim is that HighLevelLanguageX has a function called ImGui_Button() or ImGui::Button() and this function would end up calling ImGui_Button_Strv() with string pointer + length automatically passed.

Considering how simple and flexible the tech in main.py is I imagine it would be easy to provide transforms if there is a need to accommodate for specificities of language bindings generators.

(1.C) Overloads When C++ has more than 1 function with same same we can add a suffix:

bool ImGui_IsPopupOpen_ID(ImGuiID id, ImGuiPopupFlags popup_flags);
bool ImGui_IsPopupOpen_Str(const char* str_id, ImGuiPopupFlags flags);

This has the consequence that additional of new overload in C++ imgui would technically alter name of an existing function but I believe this is unlikely AND can easily be handled by keeping old names so it seems reasonable to trigger that behavior only on overloads.

And consequentially the Strv function fits in:

bool ImGui_IsPopupOpen_Strv(ImStrv str_id, ImGuiPopupFlags flags);

(1.D) Non-ImGui functions:

const char* ImFont_CalcWordWrapPositionA(ImFont* self,float scale,const char* text,float wrap_width);
const char* ImFont_CalcWordWrapPositionA_Strv(ImFont* self,float scale,ImStrv text,float wrap_width);

(1.E) Loose functions (only in imgui_internal.h) would be special:

int cImStrcmp(const char* str1,const char* str2);
int cImStrcmp_Strv(ImStr str1,ImStrv str2);

Need to use a prefix to avoid linking conflict..

I had to do one very small hack to imgui.h - moving the definition of ImDrawIdx up above ImVector<>"

Will be ok to change in master.

ImVector

All of the ImVector<> functions are exposed to C, except contains(),

For what it is worth, we never use explicitly defined == operator, so if there was/is a need to expose contains(), it would be ok to generate them using memcmp(). However, looking at the current output my feeling is that we could probably omit most ImVector stuff from the generator. That's polish we can decide on later, but I believe from a public API point of the view the user is only ever expected to use Size and [], and everything else are merely leaking in the header file.

LogText

I had to add an implementation of LogTextV() to the stubs file, since imgui doesn't currently provide one (other varargs functions have a va_list variant, so I'm guessing it got overlooked?) a

It was added pretty recently (1.82) actually, wasn't in 1.69, fixed.

Return types

When compiled there are some warnings from the C++ stubs about non-C return types - I haven't looked too deeply into this to see if there is a genuine problem here,

There is a indeed problem:

ImVec2 windowSize = igGetWindowSize();
igText("%f %f", windowSize.x, windowSize.y);

exception Haven't digged in the assembly but I guess there's an uneasy ABI problem to figure out there..

I know cimgui used void igGetWindowSize(ImVec2 *pOut);, I always felt/hoped there was a solution to do this better since afaik C does support returning structure and for 2-float structure it would be sensible?

I wonder if we can store in a local in the oneliner stub (no ABI problem) and from there turning that local into a return value may be compiler-dependant-ABI-aware trickery which if necessary could be handled by a macro so the oneline stub stays simple.

For reference the warning is:

warning C4190: 'igGetWindowSize' has C-linkage specified, but returns UDT 'ImVec2' which is incompatible with C

Generally, this is very amazing and I'm super excited with this. Thank you so much. Feels like the 80% are done and only remains the polish (maybe another 80%... but we know have a functional base).

ocornut commented 3 years ago

warning C4190: 'igGetWindowSize' has C-linkage specified, but returns UDT 'ImVec2' which is incompatible with C

As a proof of concept I tried to make both sides use the same type:

cimgui_shared_types.h:

typedef struct cImVec2
{
    float x, y;
} cImVec2;

cimgui.h

IMGUI_API struct cImVec2 igGetWindowSize(); // get current window size

cimgui.cpp

IMGUI_CAPI cImVec2 igGetWindowSize()
{
    ImVec2 v = ImGui::GetWindowSize();
    return cImVec2 { v.x, v.y };
}

This worked... In current version "ImVec2" from cimgui.h compiled from C is not same as "ImVec2" from cimgui.cpp so it's easy to confuse, but I guess we can find a neater fix that keeps the same name.

ShironekoBen commented 3 years ago

@floooh, @ocornut - Awesome, thanks for the great feedback!

I'm running around dealing with random other moving-related stuff right this second so it may be a few days before I can get enough head-space to think about this properly, but that all sounds good. I'll definitely try getting the latest imgui (it's only on the old version because that's what the sample C code came with), and see if I can think of a way to generate the necessary thunk code for allowing return structures - I have a vague idea in my head as to how to do it sensibly.

On the memory-allocations-inside-constructors front, I'd assumed when I saw that in cimgui that it was a workaround for the ages-old "don't ever do memory allocation across DLLs" problem in Windows where if you have different versions of msvcrt/etc you can end up with hilarious heap corruption should you allocate from one DLL and then free from a different one. If we're not prioritising cimgui compatibility then in light of that and the return-by-value problem I think for at least ImVec2/etc we should just add some special-case code that makes them be regular value types and get rid of the heap allocations.

ImVector<> will be fine because all the alloc/free stuff happens under the veneer anyway (and if we do strip the API down to just size() and [] anyway, the options for things to go wrong are vastly reduced).

rokups commented 3 years ago

Idea: it would be useful if this worked with other high profile imgui libraries. implot comes to mind.

floooh commented 3 years ago

even the ig prefix seems odd to me

I actually quite like the short prefix because the actual function name Button better stands out compared to ImGui_Button(). But in the end it's entirely subjective and one gets used to different naming conventions pretty fast (e.g. in my own "preferred C style" it would look like if (ig_button("Bla")) { ... }).

floooh commented 3 years ago

I know cimgui used void igGetWindowSize(ImVec2 *pOut);, I always felt/hoped there was a solution to do this better since afaik C does support returning structure and for 2-float structure it would be sensible?

AFAIK returning and passing structs by value are clearly defined in platform C-ABIs so this shouldn't be a problem (e.g. things like structs up to 16 bytes are packed into registers, and above are passed on the stack, and similar rules for return values). I think using pointers for output values is one of those early-C traditions that just won't die.

I'm currently having trouble finding examples in the Win32 API, but for instance on macOS it's quite usual to pass structs by value across DLL APIs, for instance the contentRect param here:

https://developer.apple.com/documentation/appkit/nswindow/1419477-initwithcontentrect?language=objc

...and the same for return values:

https://developer.apple.com/documentation/appkit/nsscreen/1388369-visibleframe?language=objc

(the 'visibleFrame' property is resolved under the hood into a traditional setter and getter function which use pass-by-value in both directions).

And for manually writing C code, passing and returns structs by value is a lot more convenient than using (for instance) out-pointers.

rokups commented 3 years ago

AFAIK returning and passing structs by value are clearly defined in platform C-ABIs so this shouldn't be a problem (e.g. things like structs up to 16 bytes are packed into registers, and above are passed on the stack, and similar rules for return values). I think using pointers for output values is one of those early-C traditions that just won't die.

That is not a problem for POD types, but it gets hairy once you add a constructor. Such type is no longer trivially copyable and you get a warning. It also may break in unexpected ways as this is now C++ code crammed into C rule set. I do not remember precisely how this broke, but i had to deal with exactly this issue when making C# bindings for something else. Solution is simple: you want a blittable type therefore use one (like cImVec2) and do a typecast.

thomcc commented 3 years ago

Hey all, I maintain imgui-rs (as of last December anyway), just getting around to looking at this thread. It's pretty relevant to my interests (and also, handling binding to/from C/Rust and a lot of languages used to be my day job, so I have some thoughts/opinions, but hopefully not bad ones).

Apologies for a bunch of replies, and for any rambling I do in the wall of text below.

Rust literals are not zero-terminated which is problematic, we're currently losing Rust people (many skilled/innovators) because of it.

I think there are a lot of reasons for people leaving/moving to other options (mostly egui), and the ones I've heard most are (in rough order):

portability — wasm32-unknown-unknown in particular, which I'm working on, and has slight tie-ins here, but nothing massive.
API — which goes beyond just the im_str! thing — some dubious choices made by the previous maintainer, who was very opinionated about how people should do things, which IMO is a poor match for a library like ImGUI. It's also a lot further from the C++ API than I would like.

(Also, note that the Rust im_str thing is being worked currently, and will be fixed regardless of string view support. Or rather, because we hope string view support is coming, we can fix this in the current API by use a scratch buffer to nul-terminate the string (with an perf escape hatch where we still allow users to nul-terminate it themselves — either using the macro, or just like "foo\0"). Then once string views are available, it's basically just a nice optimization. IMO this should have always been how it worked, since the cost of copying small strings is low, and there's an escape hatch for big ones, but it's neither here nor there)
The perception that "pure rust" is safer or more desirable. Not much that can be done here really (okay, there are a few things that I can bring up some other time), and it's totally unrelated to this topic regardless.
(EDIT — forgot this one but have heard it multiple times from people using ImGUI as their primary UI, which is inadvisable, but good UI frameworks are still not really there for Rust yet) Not sufficiently themable and/or the default theme is less pretty than egui (biggest competitor). This is also totally unrelated, and seems to be being worked on elsewhere.

For the first two a better generator helps, in particular if it outputs not just C code, but information about each function/struct/enum/etc in the API (like what cimgui outputs in "definitions.json" /"structs_and_enums.json"/etc).

In general outside of the internals of a wrapper like imgui-rs, Rust code will probably never be calling directly into the C or C++ API, since it's pretty awkward, requires unsafe, etc. You'll always want to wrap stuff, so, being able to at least help generate some of those wrappers programmatically, even if just at the lowest level, would be a huge help. I actually think the stuff in definitions.json and such are probably enough, but the more info that can be crammed into those files, the better. The alternative is using the libclang API, which I have experience so wouldn't be that bad, but duplicates work.

Note that more generally, some of the types of issues we hit are things like "On wasm32-unknown-unknown (important target for Rust gamedev) Rust code can't call C code that passes structs by value, since the ABI was unspecified and clang and rustc happened to disagree on that point".

IMO pure C use of the API shouldn't have to change because of that dumb quirk, but if we have info about the whole API (at a minimum, functions and their argument types and return type, and structs which get passed by value and their fields). It's easy for us to solve by generating small adapter stub C functions for Rust to call that avoid that quirk of the ABI (either taking things by pointer, or by multiple values (or w/e), and just forwarding to the normal version — since C->C calls don't have this problem).

Thats probably more of a reply than was required, but it hopefully explains some what I'd want here.

Re: passing and returning structs and such by value.

Yeah, stuff with a constructor/destructor in C++ has a different ABI when passed by value than it would have otherwise (on x86 it's always passed by pointer, for example).

Otherwise passing structs is mostly fine, but probably requires defining C-specific versions of the structs, that get unpacked and repacked to send to C++. Another option thats okay for some cases is for C++ types to subclass a C-compatible type, and then the C code uses the C type, and passing it to C++ — this will trigger some warnings about object slicing, but is basically fine — this requires changes to the C++ ImGui though, and so is likely undesirable.

Also see above about Rust eventually wanting to avoid by-value struct use on at least on wasm32-unknown-unknown (but in practice probably everywhere, in order to limit platform-specific stuff). One note is that supporting that platform for C code has other challenges (note: this is dated and there's likely a better solution, but describes the headaches well) associated with it (e.g. it isn't supported by imgui-rs today), and so in the short term, this can be ignored, but I'd like if a path forward existed (like info about the emitted bindings, but I'm flexible here).

you want a blittable type therefore use one (like cImVec2) and do a typecast

In practice this is fine, but I think this might be UB under a stricter interpretation of some C++ rules, since you're "creating objects" without instantiating their ctor? Which is why you might use the subclass trick, since that's considered fine, I believe. Either way in practice this almost certainly doesn't matter at all, just figured I'd mention it.

Consider the possibility to generate straight bindings to other languages without going through C functions.

So, I've done this (not with ImGui, tho I've been... very tempted), it pretty much sucks but can be made to work with a lot of pain and platform specific handling. I'd recommend against it unless you're integrating with a C++ compiler's API at binding generation time, which basically means you have to use libclang.

I wrote a lot more before, but it was too long and got into too many weeds, so I've deleted it. The point is: it can work, but there are so many caveats that you always will want a C api still anyway. And pretty much no solution to this problem prevents people from calling the C++ directly if that's the path they want to go down anyway.

Since I'm not really sure where it would be best for it to live (and right now if you want to test it setting up the test code in a clean way is a bit annoying), for the moment I've put a ZIP file here:

Holy crap that's a way nicer header than the current cimgui provides. Also, the python code looks much easier to follow and contribute to, which is an issue I had before (that said, part of it is that I don't know Lua, and am more familiar with Python).

One thing I will note is that for the reasons I described above, it would be really nice if there were a way for it to spit out metadata about all the types/functions/constants.

Since even in the best case scenario, I'll still need to do some code generation (for the Rust equivalent of that header, since it can't directly consume headers — although there are alternatives and if it's too difficult to provide in the short term). This seems like it would also help people creating bindings from other languages too, I think.

That said, this is awesome so far. Much cleaner, and it's great to have the docs right on the thing I'm actually calling. I often have to do quite a bit of jumping to go from rust code to the C++ code's docs.

thomcc commented 3 years ago

Some thoughts on the contents of the headers aside from the ones I see already mentioned (sorry for a second comment in a row):

Currently it #includes everything that c++ imgui does. It should just do this for the stuff it needs, even if doing this requires a few cludges.
I suspect respecting IM_VEC2_CLASS_EXTRA for the C versions of these and such is not useful here, and would only ever cause problems. (Perhaps you could use them to automate the conversion of the C struct to the C++ struct in the .cpp, though)
The lack of const support on self param is more obvious when there are const-specific versions of functions. E.g. const float* ImVector_float_begin_Const(struct ImVector_float* self); is a bit odd. I suspect for the most part const overloads aren't worth providing, but if they are worth providing, its probably worth getting const right everywhere.
igPlotHistogramFnPtrVoidPtrIntIntStrFloatFloatImVec2 🙀

I also second not wanting to have the vector functions if it can be avoided, not really ever using/wanting the weird constructor functions that cimgui insists on generating, wanting to be able to slap __declspec(dllexport) or __attribute__((visibility("default"))) on all the things, and a bunch of other stuff that have already been said.

Regarding the default args, if you wanted to make this a bit more palatable for pure C usage, I have a totally ignoreable suggestion.

Rust (like C) also has no optional arg/default arg/overloading mechanism, and so Dear ImGui's use of them can be tricky. Not sure if this is worth doing, but the technique we use to handle this might make things nicer for pure C usage, or langs that consume C apis unchanged:

// From the proposed `cimgui.h` (with word wrap applied)
IMGUI_API bool igDragInt4(const char* label, int v[4], float v_speed /* = 1.0f */,
    int v_min /* = 0 */, int v_max /* = 0 */, const char* format /* = "%d" */);

// new function — equivalent to `igDragInt4` with every arg at default values.
static inline bool igDragInt4Easy(const char* label, int v[4]) {
    return igDragInt4(label, v, 1.0f, 0, 0, "%d");
}

The naming scheme (which isn't great) and choice of static inline aren't important (the latter would have to be changed for use from non-C).

Anyway, I wouldn't be using this for Rust code, I just know from experience that the "no default args ever" version of the API can be a bit on the verbose and awkward side, and that in practice this goes pretty far to mitigate that. That said, not sure it's worth doing tho, could also just leave it to the various bindings to figure out.

(For the naming, I have no opinion. In Rust we use the unsuffixed version for the "fully defaulted" version, and stuff like ${func}_with_${arg} (if there's just one defaulted arg — e.g. button_with_size), or ${func}_with_opts (if there are too many to list e.g. calc_text_size_with_opts). That said, this scheme would be a breaking change for cimgui, and seems undesirable)

rokups commented 3 years ago

you want a blittable type therefore use one (like cImVec2) and do a typecast

In practice this is fine, but I think this might be UB under a stricter interpretation of some C++ rules, since you're "creating objects" without instantiating their ctor? Which is why you might use the subclass trick, since that's considered fine, I believe. Either way in practice this almost certainly doesn't matter at all, just figured I'd mention it.

No UB here. POD objects are special since they are supposed to not have a constructor. This is precisely why we are supposed to use cImVec2 that has no constructor - so compiler would do a plain byte copy when returning such value. If we tried to return ImVec2 (which has a constructor defined) this way - it would not work because rules for non-POD types are different.

thomcc commented 3 years ago

Hm, fair enough. I thought that wasn't true when going to a non-pod, but it's been a while since C++ was my every-day language so I'll take your word for it.

ShironekoBen commented 3 years ago

Hi all, and apologies for the long delay in updating this - real life has been "fun" for the last month (moving house/etc!).

I've spent a while going through all the great feedback here and trying to incorporate as much of it as I could into the code - this turned into some more significant architectural changes than I'd originally imagined but I definitely think it's all been a big net positive.

The biggest change by far is that I decide that the "abuse the fact that the C linker doesn't care about... well... anything and have the header file and implementation declarations of functions be totally different" approach really wasn't hugely safe or helpful, so I've changed the stub generator to do much more like what a human would do in this situation and include both the C++ headers and the C headers (wrapped in a namespace) in the implementation file, and actually generate broadly-type-safe wrappers that perform appropriate casting/etc.

This was a bit of a PITA to get right, but I'm a lot happier with the results as many things that would previously have been subtle runtime bugs are now obvious compile errors, and it also forced my hand into handling the conversion of pass-by-value structs properly, so they now have helper functions that do the conversion in a safe manner.

To enable all that I also got rid of the somewhat hacky previous mechanism of storing "old names" of things as a field, and instead the system reads in the DOM for the C++ header and then creates a clone of it before doing any modifications, with each node linked back from the clone to the original. That way at any point it can simply look back and see "what did this thing look like in the C++ header before all the C transformations happened?", which makes life an awful lot simpler.

I've also added a .json metadata generator that emits information about all the generated code for the benefit of binding to other languages... I've taken some guesses about what counts as "useful information" and formatting but haven't actually tried writing anything to consume this data yet, so please let me know if you think I've missed something/got it all horribly wrong/etc.

I had a go at improving the name disambiguation code and have something that I'm moderately happy with now - it tries to find the smallest disambiguation possible now so we get things like:

CIMGUI_API bool ImGui_Combo(const char* label, int* current_item, const char* items_separated_by_zeros);
CIMGUI_API bool ImGui_ComboCallback(const char* label, int* current_item, bool (*items_getter)(void* data, int idx, const char** out_text), void* data, int items_count);

...instead of the hilarious 64-character string of nonsense it tended to emit before.

Possibly more contentiously, I had a go at adding default argument "helpers" - i.e. version of functions that elide defaulted arguments. After playing with this a bit I actually felt like it worked best when the "helpers" are the default, so you get functions like this:

CIMGUI_API bool ImGui_BeginCombo(const char* label, const char* preview_value); // Implied flags = 0
CIMGUI_API bool ImGui_BeginComboEx(const char* label, const char* preview_value, ImGuiComboFlags flags /* = 0 */);

...the logic being that this gives an experience close to C++, where if you type "ImGui_BeginCombo(" you get the version with only the required arguments, and if you want to specify everything you need ImGui_BeginComboEx() instead.

I picked "Ex" as the suffix for that after considering a bunch of alternatives ("WithArgs", and even at one point just "_" as a kind of minimilist "doesn't get in the way when reading code" option), mainly because it's short and also there's a bit of prior history in that the Win32 API has a reasonable number of places where there are "basic" versions of function calls and then "Ex" versions with more arguments (for slightly different reasons, admittedly, but still).

The downside of this is that there are already a couple of places that use "Ex" in the code, though, so I'm not 100% sure this is the right avenue to go down. Other suggestions gratefully accepted!

For anyone interested, the full list of changes looks like this:

Removed some more ImVector functions that don't compile correctly due to the lack of operator== on certain types
Removed ImNewWrapper on newer ImGui versions
Made LogTextV() only get added on ImGui version that don't supply it themselves
Renamed IMGUI_API to CIMGUI_API
Added (void) to declarations of C functions with no arguments
Fixed field declarations not identifying the IMGUI_API prefix
Made self parameters on const functions be const
Got rid of excessive use of struct qualifier and added autogenerated forward declarations to compensate
Removed unnecessary header files and made the stdbool.h include get added in a more sensible place
Changed ImVec2/ImVec4 to be treated as by-value types and avoid new() in their constructors
Removed hacky "original fully-qualified type" system and replaced it with "unmodified_element", which links all DOM elements back to a copy of the unmodified DOM
Changed stub code generation to include both the original and generated headers, and cast correctly when passing arguments around
Added lots of namespace-related shenanigans to deal with using namespaces to separate the C and C++ headers
Added by-value struct conversion functions
Added support for converting arrays of by-value types
Removed all of the ImVector functions as they're probably more dangerous than useful
Tidied up filename handling a bit
Added support for base class lists on class declarations
Added support for comments in a few more places
Added support for adding a name prefix to all loose functions
Make function name disambiguation aware of mutually-exclusive #ifdefs, and added manual exclusions
Fixed escaped character literals not getting parsed correctly
Fixed namespaced typenames not getting parsed correctly sometimes
Fixed #error and #undef not getting parsed
Made operator name parsing more generic and able to cope with things like "operator+=" without the previous ugly special-casing
Made it possible to have multiple header files in the DOM
Changed ImGui class prefix
Fixed function/structure removal not removing associated template DOM element
Added class member accessibility support
Removed IM_VEC2_CLASS_EXTRA/IM_VEC4_CLASS_EXTRA from generated file
Added metadata generator
Fixed comments getting incorrectly treated as part of the expression for preprocessor conditionals
Added generation of helper functions with defaulted arguments
Improved function name disambiguation to use minimal number of suffixes necessary to ensure uniqueness
Removed constructors/destructors that involve heap allocation

...incidentally, parsing of imgui_internal.h is coming along (in that I've fixed a lot of things that were preventing it working), but still isn't there yet - there are a couple of big-ticket items like support for inheritance that are necessary for it to work, along with some smaller parser bugs that still need squashing.

I've put the source code up in a slight-more-sensible git repo here at https://github.com/dearimgui/dear_bindings, and for anyone who wants to take a look at the generated files without actually running it there's a ZIP file with the output here:

https://www.dropbox.com/s/9yslur7cp4yymra/DearBindingsGeneratedCode.zip?dl=0

Any feedback/ideas/suggestions on this would be very gratefully appreciated! Thanks!

ocornut commented 3 weeks ago

Moved this old thread here for archiving, and closing as solved by https://github.com/dearimgui/dear_bindings