Open machow opened 1 year ago
cc @has2k1 , who has a prototype in his docs
The pydata theme is a customisation of documentation built by sphinx/docutils. And it seems like the names of some CSS classes chosen by sphinx are chosen to match the general output from doutils. Given that our html generator is pandoc, we cannot sensibly match the html output produced by sphinx.
I think the best choice is to take the principle that docstrings define/describe objects and in html this aspect is best represented (semantically) by definition-lists. Then adapt the same semantic structure as the sphinx output, but not necessarily all the html markup details.
That is what I have tried to spec out at:
https://gist.github.com/has2k1/0f55e60029563445536c555fb355cf12
There I turn a sample docstring into html.
I have adapted some choices from py-shiny about the html markup for parameter type information.
CC: @wch
Given that our html generator is pandoc, we cannot sensibly match the html output produced by sphinx.
Just to be sure I understand. Is this saying that,
It seems like we could match the exact sphinx syntax by producing raw html in the qmd, but this would look a lot uglier (and maybe be a bit brittle). So sounds like you're saying that doing roughly the same thing, using definition list syntax works better.
Does that sound right?
edit: the html looks great, and seems like it will be a huge, long-awaited (😅) improvement!
Looking at the big picture, I think that we should try to come up with the best docstring syntax we can think of; supporting sphinx-like syntax is a very low priority.
Some thoughts:
I think we should not generate HTML directly, if we can avoid it. Instead, we should generate Markdown content that Quarto can read and render how it wants.
I'm a bit hesitant about using the nonstandard definition list syntax. Before we commit to it, I think we should investigate the Pandoc AST representation from that input. If you look here and here (search for "DefinitionList" in the page), you'll see that Pandoc represents a DefinitionList
as
-- | Definition list. Each list item is a pair consisting of a
-- term (a list of inlines) and one or more definitions (each a
-- list of blocks)
| DefinitionList [([Inline],[[Block]])]
If I understand this correctly, then for things like x : int
in your example, that whole string is the term, and so separating the x
from the int
and rendering the separate HTML for them would have to happen either (A) in quartodoc as it generates the Markdown content, or (B) in a Lua filter that is applied when when the Markdown is converted to HTML.
Another way to go is to generate a regular Markdown list, and wrap it in a div with a class. Quartodoc could provide CSS to render this like a definition list. The generated Markdown could look something like this:
::: {.quartodoc-parameters}
- **`x`** : _int_
- The value of the first dimension.
- **`*args`** : _float_
- The values for any other dimensions.
:::
Parameters
----------
x
The value of the first dimension.
*args
The values for any other dimensions.
r
The radius of the base shape.
**kwargs
Extra paramters passed onto the core function.
Returns
-------
The returned value of some function.
See Also
section seems too restrictive to me. If I understand correctly, it expects the first string to be a Python object. But I think it would be useful to be able to write prose there, with function names in line.It might be helpful to restrict this issue to be about what gets generated from a parsed docstring (rather than docstring syntax, etc..). I'd open separate issues for anything related to the docstring syntax you're using on e.g. numpydoc or griffe. If you create a new docstring syntax and get it added to griffe, it should just work in quartodoc (definitely happy to help!).
There are a lot of great thoughts on rendering above, but I want to make sure to provide context on the syntax related bits!
Can you provide examples of what the syntax should look like if the types aren't in the docstring, but instead inferred from the type annotations in code?
If you are using the numpydoc parser, it would be numpydoc syntax. So, something like this should work (note the colon in x:
):
Parameters
----------
x:
The value of the first dimension.
The syntax of See Also section seems too restrictive to me.
This sounds like it might be a numpydoc See Also section issue. That said, See Also handled in a funny way right now, since griffe doesn't attempt to parse it. So you can always override how See Also is handled in the renderer (which I think is what happens in the shiny docs right now; we ~import numpydoc
~ use a regex and parse the See Also section in its custom renderer).
It seems like we could match the exact sphinx syntax by producing raw html in the qmd, but this would look a lot uglier (and maybe be a bit brittle). So sounds like you're saying that doing roughly the same thing, using definition list syntax works better.
Does that sound right?
Yes that is right. To match sphinx you have to generate every thing in HTML.
And Sphinx HTML is tied to docutils
and so a quarto
variant of the pydata-sphinx-theme
has to be for the whole site.
Looking at the big picture, I think that we should try to come up with the best docstring syntax we can think of;
I agree.
The pressing issue is coming up with good HTML markup for docstrings. Expected formats for docstrings are numpydoc and google style. As griffe extracts and parses the docstrings, we don't really have to worry about the exact format on which to base our expected HMTL.
numpydoc
is used for reference because it is more familiar.
I think we should not generate HTML directly, if we can avoid it.
100%, that is a goal.
The issue the tricky points are how to handle links within code. e.g.
Given a type annotation
flag: bool = False
Can this be done in markdown only!
<code>a: <a href="https://docs.python.org/3/library/functions.html#bool">bool</a> = True</code>
What about if a
, :
, =
and True
are also wrapped in span tags with classes? As it stands, that is where "generate only markdown" breaks down. Here are 4 options:
Make an exception and used code
html tags.
<code> ... markdown ... </code>
Generate different HTML from markdown only
[a: [](https://docs.python.org/3/library/functions.html#bool) = True]{.code}
which generates
<span class="code">
a: <a href="https://docs.python.org/3/library/functions.html#bool">bool</a> = True
</span>
Then style with CSS as required!
Generate different HTML from markdown only as in 2 above, but use a lua-filter to convert a span
tag with class="code"
to a code
tag.
Generate different HTML from markdown similar to 2 above but use a different class unrelated to code
. And it will be styled appropriately.
Are there other options for such cases?
I'm a bit hesitant about using the nonstandard definition list syntax ...
That non-standard markup is numpydoc
format. We are only concerned about what it means. A description of a parameter.
If I understand this correctly, then for things like x : int in your example, that whole string is the term, and so separating the x from the int and rendering the separate HTML for them would have to happen either (A) in quartodoc as it generates the Markdown content, or (B) in a Lua filter that is applied when when the Markdown is converted to HTML.
As first suggested in the final HTML x : int
is the term, but griffe
provides them as separate. But maybe it makes more semantic sense to narrow down the term to the parameter name only, and have the type and default value (if any) in the description? I do see that, e.g.
<dl>
<dt><span class="parameter-name">x</span></dt>
<dd>
<span class="parameter-annotation">int</span>
<!-- <span class="parameter-default">1</span> -->
<p>The value of the first dimension.</p>
</dd>
</dl>
But it seems like in most documentation variable : type
are grouped together, just as they appear in code.
Another way to go is to generate a regular Markdown list, and wrap it in a div with a class. Quartodoc could provide CSS to render this like a definition list. The generated Markdown could look something like this:
I am kind of hang up on a definition list being more accurate semantically.
Can you provide examples of what the syntax should look like if the types aren't in the docstring, but instead inferred from the type annotations in code? Something like this would be nice:
It would be the same. Right now griffe
handles the type information with the following precedence
quartodoc
gets the variable and the type as separate.
The syntax of See Also section seems too restrictive to me. If I understand correctly, it expects the first string to be a Python object. But I think it would be useful to be able to write prose there, with function names in line.
That is the spec. You can do long prose, but it still wants that python object first.
See Also
--------
package.module.submodule.func_a :
A somewhat long description of the function.
For complete freedom, there is the Notes section.
For one reason or another, we may have to think about using a lua-filter. But lua filters have to be turned on in the quarto part of the configuration and it feels like quartodoc
leaking into quarto
. I think this a reason to avoid them if possible.
Is there any chance you'd be willing to PR a small proof of concept of the approach that seems most promising? It might help surface some of these tricky trade-offs (and it seems like you've worked through things very thoroughly!)
Is there any chance you'd be willing to PR a small proof of concept of the approach that seems most promising? It might help surface some of these tricky trade-offs (and it seems like you've worked through things very thoroughly!)
Yes I will do a PR.
If it's working in plotnine right now, we can definitely just move it upstream after everything else is done! I know you've done a ton of lifting on the site, so don't let me slow you roll!
(For context, here is the work in the plotnine renderer: https://github.com/has2k1/plotnine/pull/706/files#diff-094535f9f7581de09d350033913c8c0c4bf3262f35c80ba7a6adf51bc07ba209R128)
That is old.
It is very much a work in progress and it is blocking everything else!
Currently, we render parameters, etc.. as tables. This causes issues with wrapping, since everything has to be on the same line.
In contrast, themes like the sphinx pydata theme use description lists for the structure of the rendered docstrings (see below).
Let's extend the MdRenderer to use descriptions lists:
_render_table
to produce description listsOne challenge is there's some degree of nesting here (the parameters section itself is a description list, and its table of parameters is also a description list).