libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.27k stars 1.73k forks source link

Feature Request: Provide machine readable API definitions with SDL3 #6337

Open ikskuh opened 1 year ago

ikskuh commented 1 year ago

Heya!

I’m the author of SDL.zig, an attempt to create a Zig binding for SDL2.

As auto-translating the headers does not convey enough information about the expected types, a lot of APIs are hand-adjusted to actually fit the intent of the SDL api. One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

Now with the beginning of SDL3 development: Is the SDL project open to provide a machine-readable abstract definition of the SDL APIs that allow precise generation of C headers, Zig bindings and possibly other languages (C#, Rust, Nim, …) so there’s only one authorative source for the APIs that convey enough information to satisfy all target languages?

Regards

PS.: I'm willing to spent time and effort on this, also happy to write both the generator and definitions.

slouken commented 1 year ago

Conceptually this is fine with me, as long as it doesn't decrease readability of the headers by end users. If it does, then I would suggest a separate API definition file that's machine readable.

Can you give a sample of what a small header like SDL_sensor.h might look like?

ikskuh commented 1 year ago

Can you give a sample of what a small header like SDL_sensor.h might look like?

Just a heads up: I'm working on that, just happens that i'm at a conference right now. Will definitly post results next week

smcv commented 1 year ago

GNOME's GObject-Introspection is in the same general space as this, and GNOME-adjacent libraries use it to generate bindings, either at compile-time for compiled languages (Vala, C++, Rust) or at runtime for dynamic languages (Python, JavaScript, Perl).

SDL probably can't usefully use GObject-Introspection directly, because GObject-Introspection is designed for GLib's object model, but it's worth looking at GObject-Introspection and seeing what sort of information they needed in order to autogenerate the bindings. It uses magic comments containing annotations; the most important one is usually transfer, which marks whether ownership is transferred between caller and callee.

Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).

One example would be: SDL_Color colors has to be translated to colors: []SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

In GObject-Introspection, this distinction would be something like:

/**
 * @colors: (array length=n_colors): the palette
 *
 * Set a palette of variable size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetPalette(Picture *self, SDL_Color *colors, size_t n_colors);

/**
 * @colors: (array fixed-size=16): pointer to exactly 16 colors
 *
 * Set a palette of fixed size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetVgaPalette(Picture *self, SDL_Color *colors);

/**
 * @which: an index within the palette
 * @color: (in) (transfer none): the color
 *
 * Change one member of the palette by copying the given color, which is passed by reference.
 */
void Example_SetPaletteEntry(Picture *self, int which, SDL_Color *color);

/**
 * @which: an index within the palette
 * @color: (out caller-allocates): the color
 *
 * Get one member of the palette and store it by overwriting the contents of a struct that is passed by reference.
 */
void Example_GetColorByIndex(Picture *self, int which, SDL_Color *color_out);
ikskuh commented 1 year ago

My proposal wouldn't go that far, but especially wouldn't use C as a data ground truth. I hopefully can finish my example later this day, as the above code doesn't contain even remotely enough data to generate nice Zig or C# code. Ownership transfer is a good point, though!

smcv commented 1 year ago

One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings. A reasonable number of API entry points in GLib/GTK end up having two versions: one that is convenient for C programmers and marked as not visible to bindings (for example using varargs), and one that is convenient for binding programmers but de-emphasized for C programmers (for example always using an (array,length) pair even if that's not the most natural C representation). Usually one of them calls the other internally, or they both call into a common internal implementation.

ikskuh commented 1 year ago

@slouken: I created a example here: https://github.com/MasterQ32/SDL3-Api-Generator-Example

It implements the minimal stuff to render the Sensor API to both Zig and C. The generated code is not at the level that i want to generate, but it's pretty close.

One thing that's missing still is the ability to abstract something like function macros, which are not part of the linked api, but the compiled-in api.

One cool thing that is possible: The api generator can later parse the documentation comments and translate them into the language specific documentation format, which means everyone will get nice code comments in their IDE

Important note: I chose Lua for implementation just because it allows for a quick-and-dirty implementation. For a official API generator, i'd probably move to C, as we can remove dependencies by that.

@smcv:

One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings.

That is true. I think we can model something like that.

Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).

This one is actually a pretty cool idea. Your comments aren't incorporated yet into the API generator/data format, but it should not be that hard. Array lengths are also a pretty cool annotation, would allow Zig users to use slices ([]T, a pointer + length type) in the exposed API, and the C api is hidden from the user.

floooh commented 1 year ago

Maybe my binding generators are of interest to the discussion, outlined here:

https://floooh.github.io/2020/08/23/sokol-bindgen.html

TL;DR: I'm running my C headers through clang ast-dump, parse the resulting JSON output into a reduced 'intermediate JSON', and then generate language bindings from this (now automated via Github Actions: https://github.com/floooh/sokol/actions/runs/3122773475)

Depending on target language I'm injecting special cases (e.g. helper functions like this: https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/gfx.zig#L4-L26). I think it's important to be able to allow the final code generator to support special treatment for specific declarations, for instance printf()-like functions with variable argument lists usually can't be mapped directly to the target language. For such 'complicated cases' I don't attempt to find a generic solution, but simply inject a manually written function (in some cases not even calling the original function, but 'emulating' it in the target language, for instance here's such a 'formatted print replacement': https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/debugtext.zig#L29-L50

Clang ast-dump works ok for my case, because I can control the input C APIs (there's a blurb about "binding friendly APIs" in the blog post). The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.

A more robust solution is proabably a "proper" tool based on libclang.

In any case, here's all the python for the binding generation:

https://github.com/floooh/sokol/blob/master/bindgen/

Start at gen_all.py, then look at gen_ir.py (takes the verbose output of clang ast-dump and turns it into a much simplified JSON), and then gen_zig.py, gen_nim.py and gen_odin.py which take the intermediate JSON and generate the bindings.

PS: the most 'interesting' problem seems to be "how to deal with strings". The currently supported languages can all consume zero-terminated C strings directly, and all language specific 'structs' directly map to their C counterparts (e.g. they are 'memory-layout-compatible'. For other languages this will be more tricky and may require a proper 'marshalling layer' between the target language and the C APIs.

Hope this makes sense :)

flibitijibibo commented 1 year ago

On the subject of strings, SDL2# ended up doing its own UTF8 marshaling:

https://github.com/flibitijibibo/SDL2-CS/blob/master/src/SDL2.cs

Aside from that we're pretty faithful to the original API, and it wouldn't be hard to annotate what type of string marshaling is necessary. Having a way to generate this would be nice to have, and after 10 years of maintaining SDL2# by hand I think we have enough information to automate this.

Lokathor commented 1 year ago

Speaking up as a Rust user of SDL2, and as someone that's made both hand-written and generator-written Rust bindings for SDL2 and GL, all of this is basically a good idea.

I don't have too much to add at the moment in terms of what would help from a Rust perspective. The one thing would be that I'd like if function arguments in the machine readable definition always used integers of fixed sizes, rather than C's default numeric types that vary by platform. However, if this can't be done it's still basically fine.

sulix commented 1 year ago

While I've not used the Rust bindings much, would it make sense to tweak SDL's API to make it more directly map to Rust?

e.g., the Rust bindings make up the concept of a "Canvas" in SDL_Renderer, in order to have something with the right lifetime. (As well as things like a TextureCreator?)

These of course aren't documented in SDL (other than the Rust bindings docs), and won't appear in any other SDL tutorials, etc. If we can find a closer match between SDL and what Rust needs, so the SDL bindings don't feel so much like a different library in places, I think that'd be much more pleasant to deal with on both sides.

Lokathor commented 1 year ago

I actually have my own separate crates called fermium (raw bindings) and beryllium (rust-friendly wrappers). I've never looked too closely at what the sdl2 crate is doing or what any of their internal logic for stuff is.

lithiumtoast commented 1 year ago

Jumping in here as I have experimented with this problem from a different angle with C# with some major pains and then some minor success. I have crossed friendly paths with @floooh for generating bindings in C# for sokol using libclang.

I have documented all my knowledge / findings into the README and other documentation over at https://github.com/bottlenoselabs/c2cs. Any constructive corrections or call outs is extremely welcome. I am probably on mount stupid.

My auto-generated bindings for SDL can be found here: https://github.com/bottlenoselabs/SDL-cs. There are challenges with the SDL API which makes automatic bindgen not so "friendly" when it comes to C#. I am free to discuss this in more which is probably the most value I can bring to this discussion.

I use the c2cs tool I created to automatically generate the C# bindings for FNA C dependencies for my fork of FNA called Katabasis; I sponsor @flibitijibibo. The purpose of this fork is to expand my own curiosity for the XNA/MonoGame APIs in a way that organic and makes sense (I have a strong love hate relationship with Microsoft).

EDIT: I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.

ikskuh commented 1 year ago

I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.

The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from. That's why i'm proposing a (not yet specified, but extensible) format to document all requirements to an API. For example char * foo in C doesn't say if i can pass NULL or not. It also doesn't say if the pointer is NUL terminated or if it expects only a single char or a fixed number of them. If we can express this information in a file and generate the code from there, we can create way better bindings for most languages (Consider C# ref Point vs Point[] in marshalling)

sonoro1234 commented 1 year ago

One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)

By the way: my LuaJIT SDL binding in https://github.com/sonoro1234/LuaJIT-SDL2

ikskuh commented 1 year ago

I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)

Yes, that is correct. This conveys basically the following information:

Whereas *SDL_Color conveys this information:

and [*]SDL_Color conveys:

This means, we can translate a *SDL_Color to C# a ref SDL_Color or out SDL_Color parameter, whereas [*]SDL_Color can be translated to SDL_Color[]. At least in a marshalling context

lithiumtoast commented 1 year ago

@MasterQ32 I agree with you; I have encountered this problem and so has Silk.NET folks and many others. There appears to be a need for some form of annotations which can be used to direct bindgen more accurately.

Like @smcv mentioned earlier, the use of magic comments is one possible solution. This has advantages and disadvantages.

What I have noticed in experimentation is that libclang exposes getting any Clang attributes for a cursor. Another path forward is to direct bindgen using Clang attributes.

However, the path I'm choosing to go down myself is neither. I decided to just accept that C just does not expose enough information. Instead of trying to add more information to C code (via magic comments or attributes), I'm using auxiliary code to direct bindgen using a plugin mechanism. This works well for my use case because I don't have control over SDL, or sokol, or flecs, etc.

For example, the pattern of SDL_Color* being an array; that can be transformed appropriately to C# via auxiliary code in the form of a plugin. In your other example, of ref Point vs Point[], this pattern would also be handled by auxiliary code in the form of a plugin. Side note: using Point[] would probably not be the best idea and Span<Point> would probably be a better fit; something which I already do for fixed buffers.

flibitijibibo commented 1 year ago

Dear imgui apparently just released something like this, probably has a lot of work for C++ wrangling but still might be good for the other aspects of metadata generation: https://github.com/dearimgui/dear_bindings

1bsyl commented 1 year ago

gendyapi parse all the SDL headers, to generate the DYNAPI files. I've tried a re-write in python to fix some bug / improve (#6783)

And it's been very easy to add a json dump of all SDL API which can be useful for generating bindings. of course, extra tags you would need for allocation/pointers are missing ... but this should be easy to parse when added and specified.

I know this is the inverse solution of using a "unique source" and generates the header. but at least, it can help to generate the "unique source" from all header, if that should be chosen.

slouken commented 1 year ago

Yeah, this seems like a reasonable approach, we generate an API description from the header that can be marked up with more detail by people who are implementing language bindings.

slouken commented 1 year ago

At this point also it might be worth adding code to handle APIs that have been removed, or at least add a checklist that someone can check. It won't matter once we've finalized the ABI, but it might be useful now.

attila-lendvai commented 1 year ago

The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.

A more robust solution is proabably a "proper" tool based on libclang.

the common lisp binding generation relies on c2ffi.

i'm not sure how c2ffi relates to clang ast-dump, and what justifies its existence (because i don't know much about ast-dump).

Lokathor commented 1 year ago

A very plain XML file might be best, like GL and Vulkan do.

attila-lendvai commented 1 year ago

The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from.

my strategy is that i have the generated API in one package. it only deals with the basics, like string conversions/encoding, error return codes thrown as exceptions, etc. whatever can be done based on the info formally encoded in the C model.

then i have another package that is built on top of the generated one, and contains hand written "lispy" constructs that may use the full power of the host language.

attila-lendvai commented 1 year ago

FTR, this is a related feature request: https://github.com/libsdl-org/SDL/issues/2059 (typedef for error return codes).

slouken commented 1 year ago

Just to capture the suggestion in https://github.com/libsdl-org/SDL/issues/2059, if we generate separate API binding metadata, we could mark the SDL functions that return int as returning SDL_ReturnCode, which is defined to be 0 on success or < 0 on error, and SDL_GetError() would return useful information about what went wrong.

attila-lendvai commented 1 year ago

@slouken please keep in mind that the current infrastructure for binding generation works based on the C domain (clang based AST walker).

in e.g. common lisp, it's trivial to map a return code of a specific C type to be thrown as an exception when the value is negative. whatever custom binding machinery is introduced by SDL will require extra work, i.e. probably remain unsupported. i, for one, will not put in the extra work to add support for something that is unique for SDL.

therefore, whatever can be encoded cheaply in the C domain, is more useful when encoded there, not in some machinery that is unique to the SDL library.

shish commented 1 year ago

I'm looking at creating some nodejs SDL bindings, because all the existing ones I can find on NPM are abandoned, out of date to the point of not even compiling, awful, incomplete, or all of the above. Having a machine-readable API definition maintained by the SDL team so that my binding-generation work is just "run regenerate.sh" would make that a lot easier, and I'd be happy to contribute to making it happen :)

(For context, I'm working on a gameboy emulator in a bunch of different languages, which also happens to be exercising a whole load of different language SDL bindings, if that's any use to anyone - https://github.com/shish/rosettaboy )

slouken commented 1 year ago

@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!

I would suggest enhancing src/dynapi/gendynapi.py to automatically create the basic definition file, and then add comments to that letting people know what additional markup can be added to fine tune the binding generation.

1bsyl commented 1 year ago

Note that you can run ./gendynapi.py --dump and it creates a "sdl.json" file with all SDL API inside. eg, a list of entries like this:

  {
        "comment": "the full raw comment",
        "header": "SDL_render.h",
        "name": "SDL_CreateRenderer",
        "parameter": [
            "SDL_Window *REWRITE_NAME",
            "const char *REWRITE_NAME",
            "Uint32 REWRITE_NAME"
        ],
        "parameter_name": [
            "window",
            "name",
            "flags"
        ],
        "retval": "SDL_Renderer*"
    },    

this one matches the function:

extern DECLSPEC SDL_Renderer *SDLCALL SDL_CreateRenderer(
    SDL_Window *window, 
    const char *name, 
    Uint32 flags);

the output format can be changed/improved if needed

ikskuh commented 1 year ago

@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!

@slouken so this means you're open to the idea of having such an "official file" in the SDL repository?

@shish: I'm happy to support you with this task, if you want to tackle it. I'm taking a look at the gendyapi implementation

ikskuh commented 1 year ago

I played around a bit with the gendyapi:

/// 
/// Set the opacity for a window.
/// 
/// The parameter `opacity` will be clamped internally between 0.0f
/// (transparent) and 1.0f (opaque).
/// 
/// This function also returns -1 if setting the opacity isn't supported.
/// 
/// \param window the window which will be made transparent or opaque
/// \param opacity the opacity value (0.0f - transparent, 1.0f - opaque)
/// \returns 0 on success or a negative error code on failure; call
///          SDL_GetError() for more information.
/// 
/// \since This function is available since SDL 3.0.0.
/// 
/// \sa SDL_GetWindowOpacity
/// 
/// 
pub const SetWindowOpacity = SDL_SetWindowOpacity;
extern fn SDL_SetWindowOpacity(
    window: ?*SDL_Window,
    opacity: f32,
) c_int;

https://github.com/MasterQ32/SDL.zig/blob/7acd46267e16ff56bce1208bcf17ef71009cb560/src/binding/sdl.zig#L11062-L11084

But this code is suboptimal, as i am still able to pass NULL to SDL_SetWindowOpacity, which doesn't really make sense. It would be better to generate window: *SDL_Window for the parameter.

Also every binding generator would have to implement a C type declaration parser, which isn't optimal either, especially when types like SDL_INOUT_Z_CAP ( SDL_INOUT_Z_CAP ( maxlen ) wchar_t * dst * ) wchar_t * dst appear. Another thing that's missing is type declarations, so we'd still have to synchronize them by hand or another data source.

Maybe we can create an API descriptor on top of the json file tho, as it already contains quite a lot of information, but i would advise against using C declarations to store type information, also add another array of types which can be translated.

shish commented 1 year ago

While I start researching and reading the existing code - just for reference, I'm an experienced backend engineer who does games for fun and has never generated a binding from C headers before (I'm more accustomed to the "generate API headers for all languages, including C, from an abstract language-neutral definition" approach - eg gRPC / Thrift / OpenAPI / etc)

Random brain dumping some thoughts, with no particular promise whether they're good or bad thoughts:

Lucretia commented 1 year ago

As auto-translating the headers does not convey enough information about the expected types, a lot of APIs are hand-adjusted to actually fit the intent of the SDL api. One example would be: SDL_Color* colors has to be translated to `colors:

You think you have it hard. Try working on a thick binding which does language conversions in some cases, usually strings. I'm the author of SDLAda which is a variable width/depth binding, in some cases things can be imported directly with a simple import, with a lot of others, there are conversions to get the correct types returned and then there are actual translations; yes that really is a data translation of the macros for pixel formats.

But, having an api/idl which can be read in to generate the thin bindings would make life a bit easier.

It would need to be machine readable, but not hard to parse like C, something that can be done with predictive parsing would be easier, or have the compiler have an api to extend with new languages with a runtime plugin loader (like that of gcc).

Don't forget us strongly typed people where we can have enums rather than a bunch of constants, a la the khronos registry has groups to put them into an enumeration type, they even recognised this was required when opengl had the *.spec files as originally they generated Ada bindings to OpenGL for SGI machines.

flibitijibibo commented 5 months ago

Opened up a bounty for this issue, if anyone was looking for time to do this, hopefully it will help:

https://github.com/flibitijibibo/flibitBounties/issues/6

Lucretia commented 5 months ago

Opened up a bounty for this issue, if anyone was looking for time to do this, hopefully it will help:

I all for this, but there are a few issues. C preprocessor macros, there are a number of places which make binding to that difficult. For example, in SDLAda:

  1. I've created constants in C files to import the version numbers from the macros, these could be constants in SDL3.
  2. The pixel formats are convoluted macros, I had to translate these by hand as can be seen here, which was a nightmare.
  3. The Blit functions are a weird split up set of functions which the macro handles.

For this to happen, all this macro stuff needs to go.

madebr commented 5 months ago

I've been thinking about the following approach:

Lokathor commented 5 months ago

Trying to get the ifdef stuff right is one of the tricky parts when i manually made bindings to sdl2

Lucretia commented 5 months ago

All that pixel format stuff should be bitfields, that's what I ended creating. First if this is to be taken seriously, reduce the preprocessor crap to next to zero and have proper symbols which can be imported.

But, to do this properly, there needs to be an IDL created by the authors of the libs, that IDL could then be used to even generate the C headers. These kinds of IDL also encode type safety in them, so making it easier to generate bindings rather than having to parse the many different forms of pointer syntax from C.

Thankfully the Blit macros have gone from SDL3 from what I can see.

slouken commented 5 months ago

Please feel free to create a PR on the SDL repo to remove preprocessor definitions that interfere with language bindings.

slouken commented 5 months ago

@madebr's proposal may make that unnecessary however, so maybe that's worth pursuing?

madebr commented 5 months ago

A SDL IDL is an interesting idea, as it would solve the same issues and be more flexible in the long run (documentation wise).

Testing the viability of my proposal. Running

mkdir -p /tmp/dummy
touch /tmp/dummy/endian.h /tmp/dummy/inttypes.h /tmp/dummy/stdarg.h /tmp/dummy/stdint.h /tmp/dummy/string.h /tmp/dummy/wchar.h
cpp -undef -nostdinc -E -P include/SDL3/SDL.h -D__SDLAPIC__ -I include -I /tmp/dummy >/tmp/sdl_naked.h
cpp -undef -nostdinc -E -P include/SDL3/SDL.h -D__SDLAPIC__ -I include -I /tmp/dummy -dM>/tmp/sdl_macros.h

and looking at /tmp/sdl_naked.h and /tmp/sdl_macros.h, I think it contains parse-able data. Adding -C preserves documentation.

Lucretia commented 5 months ago

@madebr's proposal may make that unnecessary however, so maybe that's worth pursuing?

I really don't think it would, especially for the pixel format stuff.

Also, for languages which don't just dump everything into one module, i.e. has clean separation of concerns, this isn't going to work.

Lokathor commented 5 months ago

Even if the language can manage dumping it all into one file, having multi-thousand line source files makes rendered views of the file, such as github's source viewer, crawl and chug, particularly on mobile devices like phones. Just for being able to look something up it's nicer to keep files to, say, 1000 lines or less.

Lucretia commented 5 months ago

Look at how I organised SDLAda, not events.events, that's a problem due to the package visibility rules, but the rest.

ikskuh commented 5 months ago

Last year, i started a new tool called apigen which was originated from this issue, and totally forgot about the issue itself.

The tool is meant to model native APIs, and can be found here: https://github.com/MasterQ32/apigen It was designed to be vendored with projects like SDL2

I slowly started working on a SDL2 port into apigen here: https://github.com/MasterQ32/fakerootz/tree/main/api

apigen is meant to be able to also generate a JSON dump of the API information, so it's easily ingestible by other tools as well

My goal is also to eventually allow to support versioned items so you can have functions that were introduces in 2.0.1 and removed in 3.1.5 or something like that

1bsyl commented 5 months ago

I've been thinking about the following approach:

* Create a dumbed down C parser that pre-defines only `__SDLAPIC__`

* Add appropriate `#ifdef`'ery to our headers such that the parser only sees typedefs, function declarations and documentation

* Let the parser output something parsable (json/xml/yaml)

* generators can then spit out language-specific bindings (C#/Java/rust)

* as a stretch goal, the parser can extract the documentation (they are comments on the same line, or the lines before). That way we can greatly simplify/improve `wikiheaders.pl`

@madebr, just wanted to remind that gendynapi.py

... and it is required to work correctly since it creates the internal SDL dynapi files.

This JSON file could be re-used to generate bindings or wiki. Of course gendynapi.py needs some evolution for that.

(btw, a "wiki -> source code" notification, like automatic PR creation would be also a good thing).

A SDL IDL is an interesting idea, as it would solve the same issues and be more flexible in the long run (documentation wise).

Testing the viability of my proposal. Running

mkdir -p /tmp/dummy
touch /tmp/dummy/endian.h /tmp/dummy/inttypes.h /tmp/dummy/stdarg.h /tmp/dummy/stdint.h /tmp/string.h /tmp/wchar.h
cpp -undef -nostdinc -E -P include/SDL3/SDL.h -D__SDLAPIC__ -I include -I /tmp/dummy >/tmp/sdl_naked.h
cpp -undef -nostdinc -E -P include/SDL3/SDL.h -D__SDLAPIC__ -I include -I /tmp/dummy -dM>/tmp/sdl_macros.h

and looking at /tmp/sdl_naked.h and /tmp/sdl_macros.h, I think it contains parse-able data. Adding -C preserves documentation.

Just tested, (it also requires touch /tmp/dummy/wchar.h and touch /tmp/dummy/string.h on my side). Not sure how it makes thing easier. I mean, I see this strips a lot of things. but as long as we have our SDLCALL function, struct, enum, typedef). We should be fine ? and if not, maybe we should change our public API. (like no #define SDL_CONSTANT).

Lucretia commented 5 months ago

Speaking up as a Rust user of SDL2, and as someone that's made both hand-written and generator-written Rust bindings for SDL2 and GL, all of this is basically a good idea.

I don't have too much to add at the moment in terms of what would help from a Rust perspective. The one thing would be that I'd like if function arguments in the machine readable definition always used integers of fixed sizes, rather than C's default numeric types that vary by platform. However, if this can't be done it's still basically fine.

In SDLAda, I define C compatible types with valid ranges where possible, this adds extra type checking at the Ada language level. I would prefer that all data types have ranges specified in addition to sizes so that information can be used in languages that has that facility, admittedly not many.

Lokathor commented 5 months ago

I believe it's considered "compatible" for a later version of SDL3 to add more values to an an enum. Eg: 3.2 has an enum of 5 values, in 3.4 the enum might have a 6th value added.

madebr commented 5 months ago

While exploring how a SDL machine readable API might look like, I created this. The gist contains a manually crafted incomplete API in XML format, including a XML schema to formally verify it. To check whether the API is enough to get working bindings, it also includes a crude python bindings generator, that generates this.

Are there any patterns that I forgot about, are hard to express in bindings, are a lot of work, or are missing from the documentation at all?

As an example of extra information, I added a error item to the return type, so bindings know what vaue indicates an error state. (It's immature right now, and not used by the Python binding) I also added information about ownership, to appetize Rust's borrow checker.

Susko3 commented 5 months ago

The xml format looks really interesting. I think it would be beneficial to generate C headers from the xml format, so it can be compared to the actual headers.

Are there any patterns that I forgot about, are hard to express in bindings, are a lot of work, or are missing from the documentation at all?

A common pitfall of generating bindings for memory-safe languages is knowing if the returned pointer should be SDL_freed or not. This should be mentioned in the documentation of each function. And you may be able to infer it from const qualifiers on returned pointers (const char * vs char *)

How would you handle something like https://wiki.libsdl.org/SDL3/SDL_GetDisplays? It returns a buffer and a count of elements. I would expect all high quality binding libraries to have friendly overloads of that function.

A python one would probably look like this: (this code should be generated from the bindings XML)

def SDL_GetDisplays() -> list[SDL_DisplayID] | None:
    count = int()
    pointer = SDL_GetDisplays(count) # this calls the native function
    if pointer == NULL:
        return None
    ret = copy_to_new_list_or_whatever(pointer, count)
    SDL_free(pointer)
    return ret

(Please note that I have no idea how bindings in python work, but the above code should give you the general idea.)