Add support for C++ using type definitions

stephanlachnit commented 11 months ago

This PR adds support for using typeA = typeB via the cpp:type type alias declaration.

Up for discussion: I stored the "underlying type" in the value property of the cursor, since that is what felt most natural to me given the C++ syntax.

I added a test case, however two of the three tests fail (test/test_cautodoc.py::test_directive_text[cpp/using-alias] and test/test_cautodoc.py::test_directive_html[cpp/using-alias]). Since I don't really understand why they fail, I need to help here.

stephanlachnit commented 11 months ago

I added a test case, however two of the three tests fail (test/test_cautodoc.py::test_directive_text[cpp/using-alias] and test/test_cautodoc.py::test_directive_html[cpp/using-alias]). Since I don't really understand why they fail, I need to help here.

Found it, I forgot to register docstring.TypeAliasDocstring as possibility for the :cpp:autotype: directive.

BrunoMSantos commented 11 months ago

Thanks for looking into this!

Looks pretty clean, but I'm not sure this warrants a different behaviour than typedefs. The difference I see here is that you're printing out the aliased type in the documentation, but that could be done with the typedef as well in principle, so I'd rather merge those two paths under TypeDocstring and a single parser if. Any thoughts?

Two things to consider with regards to printing out the aliased type either way:

A type is arguably aliased to hide its implementation, maybe printing the implementation should be an option then?
We need to handle typedefs / aliases of anonymous types. E.g. what should happen here?
```
using new_type = struct {
      int foo;
};
```

By the way, I haven't checked how the = type gets rendered by Sphinx on each domain. But the documentation doesn't show that as a use case for the type directive under the C domain, so we may need to have that in mind.

Pinging @jnikula as he may have some input here as well.

Also, if you don't mind, can you split the test(s) into separate commits.

jnikula commented 11 months ago

Pinging @jnikula as he may have some input here as well.

Besides the nitpicks regarding the boundaries of parser/doccursor/docstring, not really.

I used to be proficient in C++, but I haven't written any in a very long time, and the modern usage seems to have evolved quite a bit. I can't honestly claim to know what C++ users would expect to see. I'm heavily relying on you two to figure this out.

jnikula commented 11 months ago

I added a test case, however two of the three tests fail (test/test_cautodoc.py::test_directive_text[cpp/using-alias] and test/test_cautodoc.py::test_directive_html[cpp/using-alias]). Since I don't really understand why they fail, I need to help here.

Found it, I forgot to register docstring.TypeAliasDocstring as possibility for the :cpp:autotype: directive.

The test suite can be a bit fussy at times, but it's generally been a life saver!

stephanlachnit commented 11 months ago

Looks pretty clean, but I'm not sure this warrants a different behaviour than typedefs. The difference I see here is that you're printing out the aliased type in the documentation, but that could be done with the typedef as well in principle, so I'd rather merge those two paths under TypeDocstring and a single parser if. Any thoughts?

It looks a bit different, that why I went for C++ specific implementation. This is it looks with an equal sign with my current theme:

Screenshot from 2023-11-19 17-02-11

Which is quite nice, since that is exactly what you also see in code. Converting this to a C style typedef feels unnatural to me. You can find more examples in the sphinx docs: https://www.sphinx-doc.org/en/master/usage/domains/cpp.html#directive-cpp-type

A type is arguably aliased to hide its implementation, maybe printing the implementation should be an option then?

That is a good point. I mainly use alias typesdefs for function pointers, and there seeing the implementation in the documentation is definitely the thing you want. But I can see that in other scenarios that might not be the case.

We need to handle typedefs / aliases of anonymous types. E.g. what should happen here?

Haven't tried, it's not something I would ever use since I don't see the benefit compared to just declaring the type normally.

Also thinking about it, this probably doesn't work yet with templated alias definitions. I have to look into that, as that is the most likes the biggest use case / advantages of using typedefs.

BrunoMSantos commented 11 months ago

Looks pretty clean, but I'm not sure this warrants a different behaviour than typedefs. The difference I see here is that you're printing out the aliased type in the documentation, but that could be done with the typedef as well in principle, so I'd rather merge those two paths under TypeDocstring and a single parser if. Any thoughts?

It looks a bit different, that why I went for C++ specific implementation.

So, I understand you want it to look different from the current typedef implementation, and I totally get why, though I think we may want to make it optional somehow. The question is whether they should be different. If you used a typedef to define the same callback, wouldn't you want the = ... then?

The crux for me is: do you know of kind of type alias that can be represented with one of them but not the other? If so, there might be a good case behind splitting the two docstring implementations. Otherwise, they should be handled in the same way.

As anecdotal evidence, Sphinx only needs one directive to cover both cases.

BrunoMSantos commented 11 months ago

Side note: not sure if that would be an issue, but if the DocCursor API is not uniform when resolving a typedef and its equivalent using, then that would be a problem to solve in DocCursor, not in docstring. In other words, TypeDocstring should still provide a single path for both ideally / long term.

jnikula commented 11 months ago

Hey @stephanlachnit & @BrunoMSantos, is it clear to you what to do next with this?

stephanlachnit commented 11 months ago

Hey @stephanlachnit & @BrunoMSantos, is it clear to you what to do next with this?

Yes - I have to add support for templates (when I find the time).

jnikula commented 11 months ago

Yes - I have to add support for templates (when I find the time).

No pressure! I'm just trying to ensure you're not waiting for some input here.

BrunoMSantos commented 11 months ago

My earlier objections to the current implementation still stand, I believe, otherwise it's clear.

That said, I would prefer to do some clean ups on docstring before tackling new functionality. And even then I intend to start with the namespace as it is a much more limiting issue. So frankly this is not really on my radar right now. Trying to muster the courage for another stint of refactoring :sweat_smile:

stephanlachnit commented 11 months ago

So, after some fighting with clang this now also supports templated aliases.

Now, the open question is: do we merge the codepath of docstring.TypeAliasDocstring with docstring.TypeDocstring?

The disadvantage when merging is that we need to keep track how the typedef/alias was made. With a C-style typedef we want to use

.. cpp:type:: std::vector<int> MyIntList

while with a C++ alias we want to use

.. cpp:type:: template<typename T> MyList = std::vector<T>

Currently, the typedef implementation does not show the underlying type, while this C++ alias implementation always shows underlying type. The advantage of a code merge would be, that this could be toggable somewhere and we have the functionality in the same class.

BrunoMSantos commented 11 months ago

So, after some fighting with clang this now also supports templated aliases.

Nice!

Now, the open question is: do we merge the codepath of docstring.TypeAliasDocstring with docstring.TypeDocstring?

The disadvantage when merging is that we need to keep track how the typedef/alias was made. With a C-style typedef we want to use
.. cpp:type:: std::vector<int> MyIntList
while with a C++ alias we want to use
.. cpp:type:: template<typename T> MyList = std::vector<T>

Did you test using the C++ format on C? I was hoping the directive would work the same regardless of the domain, but it's still a hope only.

Even if it doesn't work in the C domain, I think the right way to handle a typedef in C++ is with the .. cpp:type:: T1 = T2 syntax, minus any option to occlude the underlying type perhaps.

See also https://www.sphinx-doc.org/en/master/usage/domains/cpp.html#aliasing-declarations

This is for handling overloads, right? I didn't play around with it, but that was the impression I got.

Currently, the typedef implementation does not show the underlying type, while this C++ alias implementation always shows underlying type. The advantage of a code merge would be, that this could be toggable somewhere and we have the functionality in the same class.

I think the merge should be mostly about whether the code for each path is sufficiently different that it justifies it, not because we want the directives and outputs to be different. I.e. any using statement can be rewritten with a typedef and both should lead to the same documentation for they are functionally equivalent to my knowledge (and in this specific context of course, I know using has other uses).

jnikula commented 11 months ago

The advantage of a code merge would be, that this could be toggable somewhere and we have the functionality in the same class.

Side note, if by "togglable" you mean it could be a user option, my strong preference is for sane defaults instead.

BrunoMSantos commented 11 months ago

The advantage of a code merge would be, that this could be toggable somewhere and we have the functionality in the same class.

Side note, if by "togglable" you mean it could be a user option, my strong preference is for sane defaults instead.

Yeah, the question is whether the user wants a type to be opaque or not, and I think there is a case to be made that both are valid. Not the most important thing probably, but it would be a nice enough option in my opinion.

The one thing I don't want though is for typedef and using to have different ways of rendering equivalent type aliases with one hiding the alias and the other one showing it. If they are consistent, I'd be happy enough and we can postpone discussion about an option to hide it later (showing the alias would be my preferred default by the way).

stephanlachnit commented 11 months ago

Did you test using the C++ format on C? I was hoping the directive would work the same regardless of the domain, but it's still a hope only.

It does not. In C, only the typedef-style definitions work.

Even if it doesn't work in the C domain, I think the right way to handle a typedef in C++ is with the .. cpp:type:: T1 = T2 syntax, minus any option to occlude the underlying type perhaps.

Hm, I'm not sure. In modern C++, you wouldn't use typedefs anymore. So I suspect that most typedef are C style, i.e. typedef-ing anonymous structs (in modern C++ you would use a tuple if you don't want to define a struct directly).

This is for handling overloads, right? I didn't play around with it, but that was the impression I got.

Sorry wrong link - I meant https://www.sphinx-doc.org/en/master/usage/domains/cpp.html#directive-cpp-type

I think the merge should be mostly about whether the code for each path is sufficiently different that it justifies it, not because we want the directives and outputs to be different. I.e. any using statement can be rewritten with a typedef and both should lead to the same documentation for they are functionally equivalent to my knowledge (and in this specific context of course, I know using has other uses).

The problem is that whether the code path is sufficiently different depends on your assumption. In principle, typedefs and aliases are identical (besides for support of templates, which has to be handled specifically anyway). However, in the real world, typedefs are used for different things. I'm not a C developer so I might be wrong, but from what I saw defining typedefs are mostly used for anonymous structs, while in C++ aliases are mostly used as shorthands for long template expressions.

I would not bother to support anonymous struct per alias definitions in C++, it's a code smell anyway and we don't have to support legacy code bases (I'm not even sure if it works). Maybe anonymous structs are something you want to support for C in the future - making this explicitly C where C++ features can be ignored might be useful.

Side note, if by "togglable" you mean it could be a user option, my strong preference is for sane defaults instead.

Agreed. I can only tell for me and how I would use C++, but basically you would almost always want to see the definition of a C++ alias. I don't know it is for a C developer with typedefs.

If in C one would not expect the definition in the documentation, then I would propose to leave the two classes separate.

BrunoMSantos commented 11 months ago

Sorry wrong link - I meant https://www.sphinx-doc.org/en/master/usage/domains/cpp.html#directive-cpp-type

Hmm, I think you're right. Still haven't tested it, but re-reading the documentation it seems like the C domain supports:

.. c:type:: typeA
.. c:type:: typeB typeA

C++ domain on the other hand supports both of the above and, additionally:

.. cpp:type:: typeA = typeC

That being the case, I expect both typedefs and usings should use the respective non-opaque form and, maybe, optionally, the opaque one.

Anyway, I had forgotten that using aliases can be templated, but not typedef ones, so, overall, it seems like the overlap may be smaller than I had imagined. Let's keep to the new docstring type then :)

On another topic, if using can alias an anonymous structure, than it should be supported as such. In fact, I would like to see a test case for that. Not even sure what to expect here honestly, but we'll have to handle it somehow, much like we'll need to handle it in the typedef case once we make it default to the non-opaque version too.

Finally, aliasing an anonymous type is orthogonal to documenting a type as opaque. And defining such opaque types API wise is not legacy, it's actually a very common pattern on libraries (with good reason I think), which is why I think the option should exist in both domains. But we can make that a separate issue and I agree the non-opaque option is the better default for now.

I'll try to review the actual code today / tomorrow ;)

jnikula commented 11 months ago

Other than that, there's an issue of consistency relative to typedefs. That would not relate to this code exactly though, and fixing it according to the typedef would, in my opinion, force us to do the wrong thing again, except it's a bit harder this time due to clang resolving the alias cursor ahead of the aliased one for using, the opposite of a typedef. I.e. no promotion to struct in the current docstring-cursor pairing logic.

I'll repeat this for @stephanlachnit's benefit:

For both the anonymous /** doc */ typedef struct { ... } foo_t; and the named /** doc */ typedef struct _foo { ... } foo_t; the user expectation is to document the struct and its members, not the typedef. So they result in .. c:struct:: Sphinx directives.

If you specifically want to document the typedef, you need to define the struct and the typedef separately. Indeed, if you aim for an opaque type, it's common to have the typedef struct _foo foo_t; in a header and the struct _foo { ... }; in a .c file. For API documentation, maybe the latter would not even be documented.

One might argue that a /** doc */ typedef <anything> should unconditionally document the type, but that's really not a pragmatic solution.

Frankly, I think the using behaviour here is the correct one, but I fought that one with @jnikula already 😅 I'll defer to him on this one.

Auch, do I need to brush up my C++ now? :grimacing:

Gut feeling says it's fine that using behaves somewhat differently from a typedef here. I'm not sure if using foo = struct { ... }; is a common pattern in C++ the way typedef struct { ... } foo; is in C.

stephanlachnit commented 11 months ago

On another topic, if using can alias an anonymous structure, than it should be supported as such. In fact, I would like to see a test case for that. Not even sure what to expect here honestly, but we'll have to handle it somehow, much like we'll need to handle it in the typedef case once we make it default to the non-opaque version too.

Finally, aliasing an anonymous type is orthogonal to documenting a type as opaque. And defining such opaque types API wise is not legacy, it's actually a very common pattern on libraries (with good reason I think), which is why I think the option should exist in both domains. But we can make that a separate issue and I agree the non-opaque option is the better default for now.

I tend to disagree when considering modern C++. AFAICT, unnamed types have two purposes:

Define a new lightweight struct without a new typeid => in modern C++ you would use std::tuple and use using name = std::tuple<T...> to create a short alias if required
Typedefing an unnamed struct to avoid writing struct S all the time => this is not the case in C++ (see also https://stackoverflow.com/a/254250)

This pattern is largely a C pattern you would not use modern C++. That is of course fine, not everyone needs to write modern C++, but I can't see why someone would combine a modern C++ feature (using) with an old C-style pattern (typedefing unnamed structs). Thus, I would explicitly not support this.

BrunoMSantos commented 11 months ago

Gut feeling says it's fine that using behaves somewhat differently from a typedef here. I'm not sure if using foo = struct { ... }; is a common pattern in C++ the way typedef struct { ... } foo; is in C.

Probably not as common no, especially as struct and class have very similar semantics and you can in fact omit struct already (in all cases I believe, but I'm primarily a C dev too :man_shrugging:). -Probably, it would only be used if someone explicitly wanted a shorter alias or a more meaningful name for a generic type, etc.- Edit: actually, those would still make little sense.

So yeah, ignoring the inconsistency it is an option here. Though the issue will probably come up later once we try and make typedef directives not opaque ;)

BrunoMSantos commented 11 months ago

I tend to disagree when considering modern C++. AFAICT, unnamed types have two purposes: [...]

Hmm, I guess you're right. The use cases I was thinking of (dynamically linked libraries) would explicitly export a C compatible API with C headers, so no using there either despite my previous claim... And that's fair enough.

BrunoMSantos commented 11 months ago

And thanks for bearing with me @stephanlachnit! We certainly appreciate the interest and help :partying_face:

stephanlachnit commented 10 months ago

Updated the PR with the suggested changes. ~~Interestingly, cpp/mixed-linkage now starts failing for me, and I'm not quite sure why.~~ Seems to be just for me, maybe I'll need to update my venv.

BrunoMSantos commented 10 months ago

Thanks for the updates, I think it's all good from my side! Really sorry it took me so long to have a look at it. :christmas_tree:

jnikula / hawkmoth

Add support for C++ using type definitions #226