hannorein / rebound

💫 An open-source multi-purpose N-body code.
https://rebound.readthedocs.io/
GNU General Public License v3.0
820 stars 217 forks source link

clang-style C documentation comments would help with generated rust bindings #705

Closed spenczar closed 11 months ago

spenczar commented 1 year ago

I am interesting in calling rebound (and assist) from Rust. To that end, I've generated Rust bindings to access librebound over a C FFI interface: https://github.com/spenczar/rebound-sys

This has gone pretty well!

One thing that hasn't worked is automatic conversion of documentation comments. It happens that the standard Rust tool for this job, bindgen, only is interested in doxygen-style comments: ones that start with /** or ///.

It would be kind of nice if Rebound's headers used that style of comments. It's not a very big deal, though; these bindings are extremely low-level, and I'll certainly be writing a wrapper library that is much more human-friendly, so it's acceptable to just refer to rebound.h.

hannorein commented 12 months ago

Many years ago, we had doxygen documentation for everything. But honestly, it was too much effort to maintain this specific syntax and just felt a bit inflexible. I suspect very few people look at this sort of documentation which is why we're now mostly using short examples to explain stuff. But I get your point. So what specifically does this bindgen tool want to see? Can you provide a few examples? Does it support the /**< style syntax (which I find the least annoying)? Does it need documentation for all the arguments and return values (which I find the most annoying)?

spenczar commented 12 months ago

So what specifically does this bindgen tool want to see?

I'm glad you asked this question, because I agree that Doxygen is obnoxious. Fortunately, bindgen doesn't actually require true Doxygen. It uses clang under the hood, so it wants the format used by clang-doc, which is a much more lightweight system:

A "documentation comment" is just any comment that starts with either ///, or /**, or /*!.

The contents of the "documentation comment" can be anything you like, but plain old text is best. Don't get cute with markdown or structured text inside. There's no special way you annotate that something is a input parameter or whatever, since the signature already has that information.

To annotate a declaration, the comment needs to be on the immediately previous line(s):

/// foo does a thing
int foo(int x, int y);

/// bar does other things
struct bar {
   /// z is a useful field 
   int z;
};

/** this is a complicated function
 * so i'll give it a multiline
 * documentation comment
 */
void baz();

To annotate a struct field, the comment can be on the same line, too:

/// bar does other things
struct bar {
  int z; /// z is a useful field
}

Enums need to be annotated with the comment in front of each value. For whatever reason, same-line annotations don't work:

/// day of the week
enum WEEKDAY {
  /// the first day
  SUNDAY = 1;
  /// the second day
  MONDAY = 2;
};

So I think this issue request is much simpler than doxygenification: all I really want is for doc comments to be triple-slashed, for the most part.

Being comprehensive is not necessary. Something is better than nothing. And I think the main purpose of the comments should obviously be to help anyone working on the C code, not to help code generators for other languages' bindings!

But a naive find-and-replace of // to /// pays some dividends.

Before:

image

After:

image

One additional thing that would slightly help is empty newline comments to delineate "paragraphs".

For example, here's an undelineated comment:

/// Same as reb_binary_diff, but with options.
/// output_option: If set to 0, the differences are written to bufp. If set to 1, printed on the screen. If set to 2, then only the return value indicates any differences.
/// returns: 0 is returned if the simulations do not differ (are equal). 1 is return if they differ.
int reb_binary_diff_with_options(char* buf1, size_t size1, char* buf2, size_t size2, char** bufp, size_t* sizep, int output_option);
image

And here it is with blank lines to separate paragraphs:

/// Same as reb_binary_diff, but with options.
///
/// output_option: If set to 0, the differences are written to bufp. If set to 1, printed on the screen. If set to 2, then only the return value indicates any differences.
///
/// returns: 0 is returned if the simulations do not differ (are equal). 1 is return if they differ.
int reb_binary_diff_with_options(char* buf1, size_t size1, char* buf2, size_t size2, char** bufp, size_t* sizep, int output_option);
image
spenczar commented 12 months ago

I'm happy to submit a PR that does the tedious conversion work, since I have the rust toolchain up to verify the output. Just let me know if you think that you'd be amenable to the change.

hannorein commented 12 months ago

Cool. That's way better than the original oxygen documentation syntax that we had. You still find them scattered around the code. For example

/**
 * @brief Find the nearest neighbour in a cell or its daughters.
 * @details The function only returns a positive result if the particles
 * are overlapping. Thus, the name nearest neighbour is not
 * exactly true.
 * @param r REBOUND simulation to work on.
 * @param gb (Shifted) position and velocity of the particle.
 * @param ri Index of the root box currently being searched in.
 * @param p1_r Radius of the particle (this is not in gb).
 * @param nearest_r2 Pointer to the nearest neighbour found so far.
 * @param collision_nearest Pointer to the nearest collision found so far.
 * @param c Pointer to the cell currently being searched in.
 * @param collisions_N Pointer to current number of collisions
 * @param gbunmod Ghostbox unmodified
 */

That was just not fun to maintain. So yes, please feel free to send me a pull request! Maybe you can start with a teaser with just a few changes before sending one monster pull request?

Thanks!

spenczar commented 12 months ago

I've tried this out in #708, and I'm not super happy with how it turns out.

The generated Rust bindings are way, way nicer, but the C seems a little bit worse, at least to me. I'll let you be the final judge, but I can't honestly say I think that tradeoff is worth it. It's not so bad to refer to the C header when working in Rust, and I'll make a higher-level wrapper package anyway.