scientific-python / summit-2024

1 stars 0 forks source link

alternatives to doccer? #27

Open drammock opened 1 month ago

drammock commented 1 month ago

One thing I hoped to talk about at the summit but didn't manage to was doccer (SciPy's internal tool for docstring deduplication). MNE-Python adopted/adapted doccer many years ago, and it helped us find and fix many outdated/inaccurate docstrings.

The problem we're facing is that

  1. the docstrings can't be easily read in the source code.
  2. the docstrings are only filled in when the package is imported, which means static analyzers like pyright don't fill them in. This means that in vscode the various hover/tooltip/tab-completion things also (like the source files themselves) show the cryptic docstring placeholders instead of the filled-in parameter descriptions

Problem 1 alone wouldn't be so bad (arguably an advantage, as it reduces scrolling past screens and screens of docstring between snatches of actual code), but combined with problem 2 it has left some of our devs in a perpetually frustrated state.

My questions are:

  1. have SciPy devs found good workarounds to the problems I mention above?
    • One solution I already know is "have an ipython terminal open in your IDE, and if you need to read a docstring, use ? (like mne.what.ever?)" but I'm interested in other approaches
  2. how are other packages besides SciPy and MNE dealing with param descriptions (or other aspects of docstrings) that are repeated across many parts of your codebase (i.e., how do you keep them in sync)?
thomasjpfan commented 1 month ago

In scikit-learn, we have a tests to enforce constraints on the docstring parameters:

https://github.com/scikit-learn/scikit-learn/blob/ea1e8c4b216d4b1e21b02bafe75ee1713ad21079/sklearn/tests/test_docstring_parameters.py#L79-L80

I never pushed for a dynamic way to fill in the docstrings, because of the issues with static analyzers.

drammock commented 1 month ago

In scikit-learn, we have a tests to enforce constraints on the docstring parameters:

Thanks @thomasjpfan. We have similar tests in MNE-Python (probably we copied ours from sklearn) but like your tests, they don't enforce much about the content of the parameter descriptions, mostly just ensuring that they exist and that they come in the same order as the function signature.

I never pushed for a dynamic way to fill in the docstrings, because of the issues with static analyzers.

It's seeming like the only option that both works with static analysis and also preserves consistency across the API would be to go back to having our docstrings all hard-coded in the source files, maintaining a mapping somewhere saying "the param description for picks (or axes or whatever) should be identical across this list of functions", and then asserting that in a test.

cbrnr commented 1 month ago

To be honest, I'd prefer almost anything over doccer expansion at this point. Besides static analyzers, the benefit of being able to just read docstrings in the source cannot be overstated.

betatim commented 1 month ago

I am also not a fan of "docstrings with holes in them" aka things that aren't fully readable by opening the source file in a simple text editor.

A random idea: maybe generating the docstring once and writing it to the source file is something to investigate? Could be a tool that you run on a new class/function. For example to generate a docstring for scipy.foo.bar you'd run python generate_docstring.py scipy.foo.bar and it will spit out a (more or less) ready to go string that people can add to the code.

That way the initial docstring would be consistent with the rules.

You could even imagine something like python generate_docstring.py --check scipy.foo.bar which creates the docstring and diff's it with what is actually present. And then extend it even further using something like libcst where it adds the docstring to the source code file automatically.