pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 171 forks source link

[DOC/INF] Migrate mkdocs documentation style from sphinx to google #1032

Closed thatlittleboy closed 1 year ago

thatlittleboy commented 2 years ago

The current state of affairs

  1. The migration of pyjanitor's documentation from Sphinx to mkdocs happened late last year (#894). And with mkdocstrings, we’re currently using the “restructured-text” docstring-style option (which is not actually reST but rather "sphinx"-style docstrings, with :param :, :returns: etc.).
  2. As of time of writing, there is an ongoing sprint on incorporating MWEs in all of our (public) APIs. (#972). So, having proper and nicely-formatted example blocks in our documentation is becoming more important moving forward.

The issue(s)

Upstream support is poor

From my interactions with the core-dev of mkdocstrings, and having gotten to know the mkdocs ecosystem better over the past week or so, I’ve realized that the restructured-text/sphinx docstring style is not as well-supported as the other styles (namely, google / numpy) by mkdocstrings. For example, both the legacy python handler and new experimental python handler does not properly support the “examples” section in sphinx style at the moment (see here-legacy, and here-new).

And the new option to remove doctest flags (recently added to pytkdocs and griffe) has only been implemented for Google/Numpy styles, but NOT for the reST / sphinx styles. --- as a direct consequence of the docstring collectors not supporting "examples" section mentioned above.

Broken / Incomplete functionality

On the pyjanitor front: The following points could serve as their own individual issues/tickets on their own right, but just as an overview of what I’ve observed:

I’m unsure how many of the above issues are due to mkdocstrings not completely supporting reST. At the very least, I'm sure the first issue about doctest flag can be resolved if we move to Google style.

Using sphinx-style is not aligned with original motivation of migration

And echoing the original motivation of moving away from Sphinx build in the first place, I would argue that the Google style is more readable, easier to maintain, and more in the spirit of the mkdocs ecosystem, than the current sphinx style.

The proposed solution

As per title, I propose shifting over to Google docstring style. Most of the effort towards this should revolve around

If we’re agreeable to this proposal, I think we can draw on the experience from the previous docs migration, so suggestions on how to practically carry this out would be very welcome. It will most likely be the next major step forward on the documentation-front, after #972 is resolved.

Thoughts? Objections?

Addendum

Styles:

  1. Google -- https://google.github.io/styleguide/pyguide.html#383-functions-and-methods
  2. Numpy -- https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard

There seems to be an automated tool for converting reST to Google (pyment), but I've tried it and it's quite buggy. We'll probably have to do the migration by hand.

samukweku commented 2 years ago

@thatlittleboy thanks for the detailed writeup. @ericmjl @pyjanitor-devs/core-devs i am in favour the google/numpy style - looks clearer and more appealing visually.

ericmjl commented 2 years ago

Well-argued, @thatlittleboy! My first instincts were to resist, because I have muscle memory on sphinx-style docstrings, but I’m now convinced!

loganthomas commented 2 years ago

I'm in favor of either Google or Numpy. I'm biased toward Numpy because I use it at work :). Either way I think it's a step in the right direction. Plus darglint supports all 3 so 👍

Darglint expects docstrings to be formatted using the Google Python Style Guide, or Sphinx Style Guide, or Numpy Style Guide.

thatlittleboy commented 2 years ago

Thanks for the inputs so far and great point about darglint / compatibility with the other tools we're using! 👍🏼


I'll personally be more in favor of Google over Numpy, just purely from the observation that: The Numpy style docstring collector is just very slightly lagging in development behind Google in the new handler -- though I'm confident it will catch up with future development. And in the legacy python handler (which we're using in pyjanitor at the moment and for the foreseeable future), the active core dev is a user of Google style rather than Numpy style, so upstream support/bug fixes on mkdocstrings will probably be faster/better (my speculation).

But these reasons are definitely minor and not really a big concern. We can choose between google/numpy the style which we are most comfortable using & writing.


Anyway, this issue will be up for discussion while we finish up #972 , so no rush to come to a conclusion just yet. In particular, if anybody has concerns / potential downsides of performing this switch, I/we would love to hear them as well! 😄

One more thing I forgot to mention is: sticking with sphinx-style in pyjanitor is of course a possibility; then in order to solve the aforementioned issues, we need to EITHER get involved in implementing the changes upstream (i.e., in mkdocstrings) ourselves OR wait for their dev to support sphinx-style properly (highly unlikely).

pawamoy commented 2 years ago

Not able to copy the example code without the chevrons >>> and …. Admittedly this would just be a nice functionality to have, since ipython supports copying and running code with these markers in them regardless.

I'm planning on changing the language of >>> code blocks detected in examples to pycon, which is more correct. Then we can prevent selection of >>> and ... like this: https://mkdocstrings.github.io/recipes/#prevent-selection-of-in-python-code-blocks. Maybe this rule should be included by default.

ericmjl commented 2 years ago

I'm planning on changing the language of >>> code blocks detected in examples to pycon, which is more correct. Then we can prevent selection of >>> and ... like this: https://mkdocstrings.github.io/recipes/#prevent-selection-of-in-python-code-blocks. Maybe this rule should be included by default.

@pawamoy yes, that would be amazing to have as a default! I love the idea. The chevrons are great for doctesting, but are definitely a hassle when copying and pasting the code.

thatlittleboy commented 2 years ago

While we're on this topic,

@ericmjl It seems like the crux of Timothee's solution is that we also need to use pygments for pymdownx.highlight (we are using use_pygments: false at the moment), since the solution relies on modifying the CSS based on the gp (generic prompt) and go (generic output) classes; which are both added by pygments. There is no equivalent when using highlight.js; It doesn't recognize and wrap individual HTML tags around the >>>, ..., which means we can't target them explicitly via CSS.

Would we be open to leaving behind the Nord highlighting theme (and more broadly speaking, the use of highlight.js), and instead using whatever style is available by pygments, in order to benefit from Timothee's proposed solution?

ericmjl commented 2 years ago

Would we be open to leaving behind the Nord highlighting theme (and more broadly speaking, the use of highlight.js), and instead using whatever style is available by pygments, in order to benefit from Timothee's proposed solution?

Yep, definitely, @thatlittleboy! Function, then form :smile:.