Open thatlittleboy opened 2 years ago
sniping @ericmjl. notebooks are useful for some scenarios, take for instance pivot_longer
/pivot_wider
. that tl-dr
is true though, and i'd suspect that the api for each function should be a good enuf tl-dr
. Let's get more input from the rest of the team. @pyjanitor-devs/core-devs
@thatlittleboy thanks for chiming in! I actually agree with your sentiments, and I also think I've been not explicit enough with what I was hoping to accomplish with #957, leading to a bit of stagnation and confusion.
You're right in observing that the docstrings become way too verbose. Additionally, maintaining the functions became difficult as the docstrings started interfering with the readability of the original source file functions.py
. In our spare time, we did a big major refactor of functions.py
into a submodule, in which we tried to keep as close to "one function idea per file". That helped a bit, but there was non-uniform coverage over the examples in the docstrings. Some were very well-fleshed out, while others were not. I think the cause of this was that early on, in the interest of building out the library of functions, I was quick to merge PRs and release new versions without having rigorous checks in place to ensure that all functions were documented to the same degree.
I think the coverage of examples in the library is in big need of a redo, and we can probably do a distributed sprint to make it happen.
As I see it right now, the docs examples should fulfill the following criteria:
mkdocs
, mkdocstrings
, and mknotebooks
. I did a bit of digging, and I'm still a bit unsure how to ensure satisfy all of the conditions above simultaneously. That said, option number 2 that you mentioned above, namely:
Keep the examples as separate files, away from the function sourceCode / function docstring. Host the example notebooks somewhere else, but ensure each function docstring has a link to the corresponding example page (preferably hosted on the same Github pages as the existing API reference)
seems to be the option that makes the most sense in the short term, and we probably could build up towards option 3 later on using that as a base.
@thatlittleboy would you be open to helping out with executing on option 2? I think we'd need to start first by having one minimal example per notebook.
Sure @ericmjl , I think I should be able to help with the minimal examples / tl;dr part of the sprint.
So to be clear, the "example notebooks" that we are talking about here are the ones in here, yeah?
And it is 1 notebook per function? e.g. bin_numeric.ipynb
would be one, add_column.ipynb
would be one, and add_columns.ipynb
would be another?
And it is 1 notebook per function? e.g. bin_numeric.ipynb would be one, add_column.ipynb would be one, and add_columns.ipynb would be another?
Yes, that is right!
If you could give me a day or two to template out the workflow, that'd be awesome. It'll give me a chance to work out potential kinks before we go all-in on this way of handling minimal working examples in docstrings.
@thatlittleboy I did a few tests and ultimately found that putting minimal working examples in the docstrings is the best thing to do. We get free integration with doctests & pytest, for example! The examples also render well too.
In my latest PR #971, I made a few infrastructural changes as well to clear up the CI. Once that one gets merged, the other PRs that you've got should merge in latest dev
, and the CI issues should go away!
Looks great @ericmjl , thank you. I think this is a good direction forward, especially for offering clear, short examples to new users of pyjanitor. 👍🏻
@thatlittleboy I'd like to invite you onto the dev team. Can you ping me on Shortwhale so I can send you a link to join the Discord server? http://www.shortwhale.com/ericmjl
Yep, pinged!
Hi all, just wanted to open a discussion on the state of documentation of this package.
With the most recently merged pull request (#957) and #906, I'm inferring that a decision has been made to remove all Minimal Working Examples (MWE) from the docstrings and move them instead into Jupyter notebooks -- with 1 notebook for each function (?).
Qn: If this is so, then can I understand what is the recommended way for a user to study these examples / how the
pyjanitor
functions should be used?A bit more context on where the question is coming from:
I'm looking to incorporate this package more into my daily workflow, and the existing examples within the API reference have been instrumental to my understanding of what the package offers.
As far as I can tell, there are 2 locations for where examples are currently located:
After removing the MWE from the function docstrings (and thus, the API reference) as per #957 , is there then a plan to link up the API reference to the notebook examples, in any shape or form? That is, how is the user, coming from the API reference page, to know that there are examples available that show the functionality with sample inputs/outputs?
Take this example from PR #957
The new docstring looks like the one on the right:
I would argue that the docstring on the right is in fact less informative (!!) and the remaining "skeleton" example is essentially useless (sorry for the blunt expression), since that is essentially repeating the function parameters back to the user. (And if the point of the skeleton example is purely to inform the user there are 3 ways pyjanitor functions can be used -- method-chaining, piping, function -- then I think it is redundant since this has already been mentioned in the HomePage and there's no need to repeat this in every subsequent function docstring)
But I digress. My main point is that: the docstrings, as it is being modified currently -- examples removed with no link / mention to examples) -- is confusing to the new user who is just looking to understand what each new function is meant to do.
On potential solutions
On the note of "linking" each of the function docstrings to their respective notebook examples, I suppose there are a few ways to design it, with considerations of BOTH the organization of the src code AND the eventual user experience of reading the docs:
I'm personally more in favour of 1 myself, but I suppose I'm in the minority. 😆 I genuinely don't think any of the pyjanitor function examples require a notebook to be explained thoroughly -- after all, they are just syntactic sugar for cleaning / manipulating dfs? I often see notebook examples in the context of explaining ML workflows / how to use a certain NN model (think: pytorch/dgl; training & evaluating ModelXXX on the MNIST dataset).
But barring solution 1, solution 3 seems like a nice middleground (huge fan of pandas' docs), but probably more complicated to implement than 2. If we indeed go for 2, I think we also need a tl;dr section for each notebook; but that's a different issue altogether. Thoughts?
ps: Also don't mean to knock on the efforts made in #957 too much, forgive me 😝 Happy new year all 🎉