jnikula / hawkmoth

Hawkmoth - Sphinx Autodoc for C
https://jnikula.github.io/hawkmoth/
BSD 2-Clause "Simplified" License
74 stars 12 forks source link

Looking to see if this library would be a good fit. #242

Closed kdschlosser closed 2 months ago

kdschlosser commented 3 months ago

I am one of the people that help out with LVGL. You may or may not have heard of it. It is a very popular lightweight GUI framework that is written in C.

The docstrings are located in the header files written for doxygen. As we know doxygen can be difficult to work with when wanting to add anything custom and the output is really geared as being more along of the lines of just documenting the API.... While doxygen is currently used it is only used to generate the XML output so it is able to be read by breathe. I have come across some bugs/limitations in breathe and requested support and the response I got was that unless the maintainer is paid no more work is going to be done with the library. That basically means it is a dead project...

Now I am on the hunt for a replacement....

I am running into some issues getting hawkmoth up and running. It seems like the clang compiler is being run on the files one at a time instead of all of the files at once. There is a config file for our library that needs to populate throughout the entire library as this is what turn on and off various parts and pieces of LVGL. This config file is only included once in the main header file and that main header file then includes all of the rest of the header files. For the API portion of the documentation it is important to separate the API by header file.

Because breathe looked at the xml output of doxygen I was able to run doxygen against the main header files and doxygen would generate all of the xml files for the entire project. I was able to pass the path for the xml files to breathe. Because the entire project was read by doxygen all of the config settings propagated through the code correctly.

Also is there a way to exclude specific C files from being parsed?, Is there a way to specify more than one source root?

Am I able to use sphinx directives in doxygen style docstrings and have those directives properly link to other areas of the documentation?

kdschlosser commented 3 months ago

Here is the documentation that we are currently building.

https://docs.lvgl.io/master/

and here is the code.

https://github.com/lvgl/lvgl

jnikula commented 3 months ago

Hey, thanks for your interest in Hawkmoth! Indeed we aim to be a lean alternative to Doxygen+Breathe if your main goal is to document C APIs in Sphinx, and don't want all the bells and whistles of Doxygen. Sounds like like a good match for your case.

That said, I won't claim the migration is always going to be trivial. I'll look into your questions (and code!) soon, and let's see if we can figure this out.

jnikula commented 3 months ago

It seems like the clang compiler is being run on the files one at a time instead of all of the files at once.

Correct. Source files are parsed as needed, based on the directives in the rst documents. See https://jnikula.github.io/hawkmoth/stable/directives.html

There is a config file for our library that needs to populate throughout the entire library as this is what turn on and off various parts and pieces of LVGL. This config file is only included once in the main header file and that main header file then includes all of the rest of the header files.

I'm not sure what to make of that in terms of documentation.

For the API portion of the documentation it is important to separate the API by header file.

With Hawkmoth, the structuring is really up to you, for better and for worse. The documentation is incorporated to rst documents based on the directives. If you want 1:1 mapping between rst and C header, you need to create the rst file, and include the documentation from the C header. It can be tedious if you need to do that for a lot of files, but it's also flexible in that you can include the documentation from multiple source files any way you wish into a single rst document.

Also is there a way to exclude specific C files from being parsed?

It's really the other way round. You need to explicitly specify which C files to parse. You can use the c:autodoc directive to include a number of source files in one go, with globbing, or you can have more fine grained granularity and use the other directives to incorporate documentation at a symbol granularity.

Is there a way to specify more than one source root?

At the moment, no. Basically you need to choose a source root that is a parent to all the "roots" you want to use, and use the relative paths accordingly in the directives.

Am I able to use sphinx directives in doxygen style docstrings and have those directives properly link to other areas of the documentation?

From Hawkmoth perspective, we do as little as possible with the docstrings, or documentation comments, and pass them unmodified to Sphinx. Including the Sphinx directives and references and formatting. So it should all just work.

The Doxygen/Javadoc support we have is really rather trivial and minimal conversion of the Doxygen formatting to reStructuredText. And then Sphinx sees that rst. The Doxygen support is handled by the javadoc built-in extension https://jnikula.github.io/hawkmoth/stable/built-in-extensions.html#hawkmoth-ext-javadoc - and you may also want to look at the implementation of it to see how simple it really is https://github.com/jnikula/hawkmoth/blob/master/src/hawkmoth/ext/javadoc/__init__.py. If you have rst within your Doxygen-formatted documentation comments, it should work as long as it doesn't conflict with the conversions.

It follows that everything just works better if you use rst directly instead of Doxygen formatting, but that can be a lot of work if you are migrating. But you should be able to start with Doxygen style and convert to reStructuredText gradually, if you choose to convert at all.

jnikula commented 3 months ago

You may also wish to look at the Mesa documentation and source. They have a slightly different approach in that they don't want 1:1 mapping between rst documents and C source, but rather have higher level rst documents that include relevant C symbols. I'm not suggesting you do that, but look at it for examples how Hawkmoth works in that environment.

The ISL stuff for example:

kdschlosser commented 3 months ago

I don't have an issue with creating a build directory and then duplicating the source file/folder structure in ReST files that will align with the C source files that I want to have the documentation generated for. That is actually what we are doing now it is just a mater of changing the directive being used to point to using Hawkmoth.

This is what ends up happening when I do that though...

Temp\tmpw_qkxo6s.lvgl_docs\API\src\misc\lv_anim_timeline.rst:8: ERROR: C:src\misc\lv_anim_timeline.h: Error parsing translation unit.
Temp\tmpw_qkxo6s.lvgl_docs\API\src\misc\lv_anim_timeline.rst:8: WARNING: No documented symbols were found.

and this is what the generated ReST file looks like.

.. _lv_anim_timeline_h:

==================
lv_anim_timeline.h
==================

.. c:autodoc:: src\misc\lv_anim_timeline.h

I have the root pointing to the absolute path where the clone of the repo is stored.

LVGL is a fairly large project. There are 410 header files with 37583 lines of code, 856 C source files with 340605 lines of code, 50 CPP source files with 18981 lines of code. There are 221 ReST files that store the documentation and there is another 331 ReST files that get generated during the documentation build.

There is a single entry point for the user. That is lvgl.h. public API all gets included into that header file. There is a single configuration header file that gets included at the top of lvgl.h. In the config header there there are 346 macros that control what portions of LVGL are to be compiled. Because that config file gets included at the top of lvgl.h the headers that get included after that use the macros that have been defined in the config file. by having clang read the header files one at a time the config settings never get loaded so the macros are not made available.

I know that i can pass command line parameters to clang but seeing as how there are a total of 346 macros that I would have to pass to the compiler through the command line, I don't think that is the best way to go about doing it. If there was a way to force the inclusion of a header file when having clang read the file that would be ideal. I could monkey patch in the code that is needed to make a temporary header file that would programmatically add an inclusion for the config file and the header file that the documentation is needing to be compiled for. I just don't know where in hawkmoth the code is located for launching clang.

I believe the Error parsing translation unit. is being caused by macros that are not being loaded from that config file.

kdschlosser commented 3 months ago

I also wanted to say TY for specifically mentioning how to tell python clang where the shared library is located. That made it a whole lot easier with getting python clang working properly on Windows using the clang in Visual Studio. I wrote a helper library for python some time ago that creates a proper MSVC build environment to compile c extensions on Windows. I use Windows COM objects to access the Visual Studio installer to collect what components are installed and where things are located.

It might be worth mentioning in your docs...

I have this code right at the very top of the sphinx conf.py file

import os
import sys

if sys.platform.startswith('win'):
    import pyMSVC

    env = pyMSVC.setup_environment()
    if not env.visual_c.has_clang:
        raise RuntimeError('clang compiler not installed')

    clang_path = os.path.join(env.visual_c.clang_path, 'libclang.dll')
elif sys.platform.startswith('darwin'):
    clang_path = 'libclang.dylib'
else:
    clang_path = 'libclang.so'

from clang.cindex import Config

Config.set_library_file(clang_path)

just have to make sure that the version of python clang that is installed matches the version of clang being used. Or has the same major version and the next patch version above the clang patch version. My build environment has clang 16.0.5 installed with Visual Studio. There is no python clang version 16.0.5. There is a 16.0.6 and that is what I have installed.

jnikula commented 3 months ago

There is a single entry point for the user. That is lvgl.h. public API all gets included into that header file. There is a single configuration header file that gets included at the top of lvgl.h. In the config header there there are 346 macros that control what portions of LVGL are to be compiled. Because that config file gets included at the top of lvgl.h the headers that get included after that use the macros that have been defined in the config file. by having clang read the header files one at a time the config settings never get loaded so the macros are not made available.

Ah, now your earlier comment makes sense.

If there was a way to force the inclusion of a header file when having clang read the file that would be ideal.

There actually is! The Clang (and GCC) option is -include <filename>. The file is read before the source is preprocessed. Add that to hawkmoth_clang in your conf.py as described at https://jnikula.github.io/hawkmoth/stable/extension.html#hawkmoth_clang

jnikula commented 3 months ago

Looking at the sources at random, some things that might cause you trouble:

There's no support for @file and this might get attached to the somewhat random next symbol (but ignored if there's nothing interesting until the next comment):

/**
 * @file lv_draw.h
 *
 */

I think these will be interpreted as documentation comments and attached to the somewhat random next symbol (but ignored if there's nothing interesting until the next comment):

/*********************
 *      INCLUDES
 *********************/

Hawkmoth won't document these without a documentation comment above the struct:

struct lv_layer_t  {

    /** Target draw buffer of the layer*/
    lv_draw_buf_t * draw_buf;

    /** The absolute coordinates of the buffer */
    lv_area_t buf_area;
    ...
};

It's sufficient to add a minimal "empty" comment such as /***/ above them.

The phrases within single ticks will be interpreted text in rst, not monospace/preformatted, see https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#interpreted-text. It might get rendered in italics, or something else.

The end of the line comments aren't supported, they always need to be before:

    lv_gradient_stop_t   stops[LV_GRADIENT_MAX_STOPS];  /**< A gradient stop array */
    uint16_t             stops_count;                   /**< The number of used stops in the array */
kdschlosser commented 3 months ago

We are actually in the process of overhauling the doxygen docstrings because of the 3000+ warnings that doxygen generates. so making changes for things like that we can do without an issue.

The one thing that we really want to be able to do is use ReST directives in the C docstrings. If it was my decision to make I would convert all of the docstrings in the C source files to use the ReST syntax. While there is a guy at the top he prefers the committee approach when making API type changes. I am not even going to make the suggestion about converting all of the doxygen docstrings to ReST style. I already know how that would turn out. A lot of C programmers know doxygen and getting them to learn ReST... That's not gonna be something that people are doing to like. Last year I spent a large amount of time getting rid of all of the markdown that was used. The entire documentation system was written in markdown and sphinx was used to compile it. A large portion of the "fluff" that is built into sphinx doesn't work with markdown. I have found that every couple of months I have to go over all of the ReST files and fix all of the incorrect directives and the characters used for headings...

If there is a way to have the javadoc extension collect the docstrings and then get it processed by sphinx so the directives can get evaluated.. That would be huge for our project. breathe had a way of doing something similar but it was obnoxious to use and had a lot of additional text that had to be added to a doxygen docstring to make it work. Since your library doesn't use doxygen at all there is possibility of a conflict with doxygen parsing a docstring that contains sphinx/ReST syntax.

No errors occur from these. I believe they are ignored. If they are that is fine by me as they are not something that needs to have anything done with them...

/**
 * @file lv_draw.h
 *
 */

This actually throws errors. I don't remember off the top of my head the exact error but I can get it over to ya. They are not ignored though

/*********************
 *      INCLUDES
 *********************/

There actually is! The Clang (and GCC) option is -include .

This is the meal ticket right here... MSVC doesn't have that ability and MSVC is what I am more familiar with.

kdschlosser commented 3 months ago

also, sorry I didn't explain the whole config header file thing well enough in the first post. I was trying to think of the best way to explain it and I didn't do a good job the first go around. Happy that I did better on the second attempt.

kdschlosser commented 3 months ago

question about naming..

This is legal code in C.

typedef struct some_struct {
    struct some_struct field;
} some_struct;

This didn't work using breathe because of the type name and the struct name being the same. It would pitch a fit about it. Does this work correctly?

Another thing is function pointer prototypes. any problems with what is below?

typedef uint32_t (*lv_tick_get_cb_t)(void);

handling of forward declarations? Does this get evaluated as some_struct a type of _some_struct or does some_struct get evaluated as a struct?

typedef _some_struct some_struct;

struct _some_struct {
    some_struct field;
}

does this get documented as a struct or does it get documented as a type to an anonymous struct?

typedef struct {
    bracket_stack_t br_stack[LV_BIDI_BRACKET_DEPTH];
    uint8_t br_stack_p;
} lv_bidi_ctx_t;
kdschlosser commented 3 months ago

also, I have a complete list of all of the doxygen directives if you want them. It might be useful if you are interested in making the jacadoc extension more complete.

jnikula commented 3 months ago

also, I have a complete list of all of the doxygen directives if you want them. It might be useful if you are interested in making the jacadoc extension more complete.

Thanks, but the extension already contains all the doxygen directives, whether it's implemented or not: https://github.com/jnikula/hawkmoth/blob/master/src/hawkmoth/ext/javadoc/__init__.py#L198

It's more a question of how much effort to put into it. I don't intend this to be anywhere near complete Doxygen replacement, but just enough support to make migration feasible.

jnikula commented 3 months ago

We are actually in the process of overhauling the doxygen docstrings because of the 3000+ warnings that doxygen generates. so making changes for things like that we can do without an issue.

[...] I have found that every couple of months I have to go over all of the ReST files and fix all of the incorrect directives and the characters used for headings...

IMHO the only way to keep this clean is to fix all the warnings and make it a CI requirement to be warning free in all merge requests. But I digress.

If there is a way to have the javadoc extension collect the docstrings and then get it processed by sphinx so the directives can get evaluated.. That would be huge for our project. breathe had a way of doing something similar but it was obnoxious to use and had a lot of additional text that had to be added to a doxygen docstring to make it work. Since your library doesn't use doxygen at all there is possibility of a conflict with doxygen parsing a docstring that contains sphinx/ReST syntax.

I think for the most part it should just work the way you want, also with the Doxygen conversions. Try it out.

jnikula commented 3 months ago

question about naming..

These are tricky questions that we've been through several times, trying to choose the option that leads to least surprises.

This is legal code in C.

typedef struct some_struct {
    struct some_struct field;
} some_struct;

This didn't work using breathe because of the type name and the struct name being the same. It would pitch a fit about it. Does this work correctly?

Not just Breathe, but Sphinx has an issue with the namespaces. So you can only document one or the other. Hawkmoth generates documentation for one only, and should avoid the issue.

Hawkmoth documents that as a struct some_struct. It's a struct instead of a type, because only that allows documenting the members.

This gets documented as struct some_struct:

typedef struct some_struct {
    struct some_struct field;
} some_struct_typedef;

And this as struct some_struct_typedef.

typedef struct {
    struct some_struct field;
} some_struct_typedef;

It's not precisely accurate, but I think it matches intent. If you want to document the type, then you need to separate the struct and the typedef, and document them separately.

Another thing is function pointer prototypes. any problems with what is below?

typedef uint32_t (*lv_tick_get_cb_t)(void);

That gets documented as .. c:type:: lv_tick_get_cb_t. You won't get the function signature, because it's really a type, but I understand that can be a small disappointment.

handling of forward declarations? Does this get evaluated as some_struct a type of _some_struct or does some_struct get evaluated as a struct?

typedef _some_struct some_struct;

struct _some_struct {
    some_struct field;
}

The first is documented as a type, the latter as a struct.

does this get documented as a struct or does it get documented as a type to an anonymous struct?

typedef struct {
    bracket_stack_t br_stack[LV_BIDI_BRACKET_DEPTH];
    uint8_t br_stack_p;
} lv_bidi_ctx_t;

This one gets documented as struct lv_bidi_ctx_t, because, like I said above, that's the only way to document the members.

If you want to do struct typedefs in one statement, I think it works better to typedef anonymous structs. The other option is to separate the typedef and the struct definition.

jnikula commented 2 months ago

@kdschlosser Any further questions? Or should I just close the issue now.

kdschlosser commented 2 months ago

This issue is OK to be closed.