Support pydata style tables (using description lists)

machow commented 1 year ago

Currently, we render parameters, etc.. as tables. This causes issues with wrapping, since everything has to be on the same line.

In contrast, themes like the sphinx pydata theme use description lists for the structure of the rendered docstrings (see below).

Let's extend the MdRenderer to use descriptions lists:

Let's subclass it to start.
Let's extender _render_table to produce description lists
Let's extend the section renderers to also produce description lists

One challenge is there's some degree of nesting here (the parameters section itself is a description list, and its table of parameters is also a description list).

machow commented 1 year ago

cc @has2k1 , who has a prototype in his docs

has2k1 commented 1 year ago

The pydata theme is a customisation of documentation built by sphinx/docutils. And it seems like the names of some CSS classes chosen by sphinx are chosen to match the general output from doutils. Given that our html generator is pandoc, we cannot sensibly match the html output produced by sphinx.

I think the best choice is to take the principle that docstrings define/describe objects and in html this aspect is best represented (semantically) by definition-lists. Then adapt the same semantic structure as the sphinx output, but not necessarily all the html markup details.

That is what I have tried to spec out at:

https://gist.github.com/has2k1/0f55e60029563445536c555fb355cf12

There I turn a sample docstring into html.

I have adapted some choices from py-shiny about the html markup for parameter type information.

CC: @wch

machow commented 1 year ago

Given that our html generator is pandoc, we cannot sensibly match the html output produced by sphinx.

Just to be sure I understand. Is this saying that,

if we use markdown-specific syntax to generate the qmd for a API doc page (like the cool pandoc definition list syntax you showed me!), then we can't match the exact syntax sphinx outputs.

It seems like we could match the exact sphinx syntax by producing raw html in the qmd, but this would look a lot uglier (and maybe be a bit brittle). So sounds like you're saying that doing roughly the same thing, using definition list syntax works better.

Does that sound right?

edit: the html looks great, and seems like it will be a huge, long-awaited (😅) improvement!

wch commented 1 year ago

Looking at the big picture, I think that we should try to come up with the best docstring syntax we can think of; supporting sphinx-like syntax is a very low priority.

Some thoughts:

I think we should not generate HTML directly, if we can avoid it. Instead, we should generate Markdown content that Quarto can read and render how it wants.
I'm a bit hesitant about using the nonstandard definition list syntax. Before we commit to it, I think we should investigate the Pandoc AST representation from that input. If you look here and here (search for "DefinitionList" in the page), you'll see that Pandoc represents a DefinitionList as
```
-- | Definition list. Each list item is a pair consisting of a
-- term (a list of inlines) and one or more definitions (each a
-- list of blocks)
| DefinitionList [([Inline],[[Block]])]
```
If I understand this correctly, then for things like x : int in your example, that whole string is the term, and so separating the x from the int and rendering the separate HTML for them would have to happen either (A) in quartodoc as it generates the Markdown content, or (B) in a Lua filter that is applied when when the Markdown is converted to HTML.
Another way to go is to generate a regular Markdown list, and wrap it in a div with a class. Quartodoc could provide CSS to render this like a definition list. The generated Markdown could look something like this:

::: {.quartodoc-parameters}
- **`x`** : _int_
    - The value of the first dimension.
- **`*args`** : _float_
    - The values for any other dimensions.
:::

Can you provide examples of what the syntax should look like if the types aren't in the docstring, but instead inferred from the type annotations in code? Something like this would be nice:

Parameters
----------
x
    The value of the first dimension.
*args
   The values for any other dimensions.
r
   The radius of the base shape.
**kwargs
   Extra paramters passed onto the core function.

Returns
-------
The returned value of some function.

The syntax of See Also section seems too restrictive to me. If I understand correctly, it expects the first string to be a Python object. But I think it would be useful to be able to write prose there, with function names in line.

machow commented 1 year ago

It might be helpful to restrict this issue to be about what gets generated from a parsed docstring (rather than docstring syntax, etc..). I'd open separate issues for anything related to the docstring syntax you're using on e.g. numpydoc or griffe. If you create a new docstring syntax and get it added to griffe, it should just work in quartodoc (definitely happy to help!).

There are a lot of great thoughts on rendering above, but I want to make sure to provide context on the syntax related bits!

Things that may be worth opening separate issues on

Can you provide examples of what the syntax should look like if the types aren't in the docstring, but instead inferred from the type annotations in code?

If you are using the numpydoc parser, it would be numpydoc syntax. So, something like this should work (note the colon in x:):

Parameters
----------
x:
    The value of the first dimension.

The syntax of See Also section seems too restrictive to me.

This sounds like it might be a numpydoc See Also section issue. That said, See Also handled in a funny way right now, since griffe doesn't attempt to parse it. So you can always override how See Also is handled in the renderer (which I think is what happens in the shiny docs right now; we ~import numpydoc~ use a regex and parse the See Also section in its custom renderer).

has2k1 commented 1 year ago

It seems like we could match the exact sphinx syntax by producing raw html in the qmd, but this would look a lot uglier (and maybe be a bit brittle). So sounds like you're saying that doing roughly the same thing, using definition list syntax works better.

Does that sound right?

Yes that is right. To match sphinx you have to generate every thing in HTML.

And Sphinx HTML is tied to docutils and so a quarto variant of the pydata-sphinx-theme has to be for the whole site.

has2k1 commented 1 year ago

Looking at the big picture, I think that we should try to come up with the best docstring syntax we can think of;

I agree.

The pressing issue is coming up with good HTML markup for docstrings. Expected formats for docstrings are numpydoc and google style. As griffe extracts and parses the docstrings, we don't really have to worry about the exact format on which to base our expected HMTL.

numpydoc is used for reference because it is more familiar.

I think we should not generate HTML directly, if we can avoid it.

100%, that is a goal.

The issue the tricky points are how to handle links within code. e.g.

Given a type annotation

flag: bool = False

Can this be done in markdown only!

<code>a: <a href="https://docs.python.org/3/library/functions.html#bool">bool</a> = True</code>

What about if a, :, = and True are also wrapped in span tags with classes? As it stands, that is where "generate only markdown" breaks down. Here are 4 options:

Make an exception and used code html tags.
```
<code> ... markdown ... </code>
```

Generate different HTML from markdown only

[a: [](https://docs.python.org/3/library/functions.html#bool) = True]{.code}

which generates

<span class="code">
a: <a href="https://docs.python.org/3/library/functions.html#bool">bool</a> = True
</span>

Then style with CSS as required!

Generate different HTML from markdown only as in 2 above, but use a lua-filter to convert a span tag with class="code" to a code tag.
Generate different HTML from markdown similar to 2 above but use a different class unrelated to code. And it will be styled appropriately.

Are there other options for such cases?

I'm a bit hesitant about using the nonstandard definition list syntax ...

That non-standard markup is numpydoc format. We are only concerned about what it means. A description of a parameter.

If I understand this correctly, then for things like x : int in your example, that whole string is the term, and so separating the x from the int and rendering the separate HTML for them would have to happen either (A) in quartodoc as it generates the Markdown content, or (B) in a Lua filter that is applied when when the Markdown is converted to HTML.

As first suggested in the final HTML x : int is the term, but griffe provides them as separate. But maybe it makes more semantic sense to narrow down the term to the parameter name only, and have the type and default value (if any) in the description? I do see that, e.g.

<dl>
  <dt><span class="parameter-name">x</span></dt>
  <dd>
    <span class="parameter-annotation">int</span>
    <!-- <span class="parameter-default">1</span> -->
    <p>The value of the first dimension.</p>
  </dd>
</dl>

But it seems like in most documentation variable : type are grouped together, just as they appear in code.

Another way to go is to generate a regular Markdown list, and wrap it in a div with a class. Quartodoc could provide CSS to render this like a definition list. The generated Markdown could look something like this:

I am kind of hang up on a definition list being more accurate semantically.

Can you provide examples of what the syntax should look like if the types aren't in the docstring, but instead inferred from the type annotations in code? Something like this would be nice:

It would be the same. Right now griffe handles the type information with the following precedence

Parameter type in the docstring
Type annotation in the function signature

quartodoc gets the variable and the type as separate.

The syntax of See Also section seems too restrictive to me. If I understand correctly, it expects the first string to be a Python object. But I think it would be useful to be able to write prose there, with function names in line.

That is the spec. You can do long prose, but it still wants that python object first.

See Also
--------
package.module.submodule.func_a :
    A somewhat long description of the function.

For complete freedom, there is the Notes section.

has2k1 commented 1 year ago

For one reason or another, we may have to think about using a lua-filter. But lua filters have to be turned on in the quarto part of the configuration and it feels like quartodoc leaking into quarto. I think this a reason to avoid them if possible.

machow commented 1 year ago

Is there any chance you'd be willing to PR a small proof of concept of the approach that seems most promising? It might help surface some of these tricky trade-offs (and it seems like you've worked through things very thoroughly!)

has2k1 commented 1 year ago

Is there any chance you'd be willing to PR a small proof of concept of the approach that seems most promising? It might help surface some of these tricky trade-offs (and it seems like you've worked through things very thoroughly!)

Yes I will do a PR.

machow commented 1 year ago

If it's working in plotnine right now, we can definitely just move it upstream after everything else is done! I know you've done a ton of lifting on the site, so don't let me slow you roll!

(For context, here is the work in the plotnine renderer: https://github.com/has2k1/plotnine/pull/706/files#diff-094535f9f7581de09d350033913c8c0c4bf3262f35c80ba7a6adf51bc07ba209R128)

has2k1 commented 1 year ago

https://github.com/has2k1/plotnine/pull/706/files#diff-094535f9f7581de09d350033913c8c0c4bf3262f35c80ba7a6adf51bc07ba209R128

That is old.

It is very much a work in progress and it is blocking everything else!

machow / quartodoc

Support pydata style tables (using description lists) #247

Things that may be worth opening separate issues on