pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.25k stars 17.79k forks source link

ENH: (Automatically) provide links to examples and the userguide within the documentation #41125

Open ldorigo opened 3 years ago

ldorigo commented 3 years ago

Is your feature request related to a problem?

While Panda's documentation is stellar, it often only answers the "how" of a specific function, not the "when" or "why". This makes it hard to know what a specific function should be used for and what common usecases are.

Describe the solution you'd like

Matplotlib and Sklear both have a very nice solution to this problem, where they automatically (I think) include links in the end of documentation pages towards examples that use that specific function, and to relevant places in the user guide.

rhshadrach commented 3 years ago

Thanks for suggesting the idea. I think links to some examples from Matplotlib/Sklearn would be useful, as well as some suggestions for pandas doc pages that could use this.

KiranHipparagi commented 3 years ago

I am new to OpenSource Contributions . I want to work on this feature . Can I contribute to this issue ? and @ldorigo could you please give some example or links how pandas and sklearn docs are different?what are your expectation in detail.. Thanks!!

ldorigo commented 3 years ago

Hi @KiranHipparagi and @rhshadrach, and sorry for ghosting after my initial suggestion.

Here are a few examples and details for what I meant/think would be nice to have:

Sklearn:

Random example: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer

All reference pages (that I've seen) contain a final section "examples", which I'm quite certain is generated automatically, and links to any example scripts/notebooks using that function. Additionally, the majority of function/classes also contain a link to the user guide (at the end of the introductory paragraph) - I'm not sure how exactly they maintain it, but that link is probably added manually.

Matplotlib:

Random example: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend

Their "examples" section look eerily similar to those from sklearn, so either they took inspiration from one another, or they are actually generating them in the same way? Which would be good, as that means there is probably an open source tool that might allow doing the same for pandas.

Tensorflow

Wasn't in the initial issue, but it's another very popular data science library that does something similar.

Example: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

As you can see, most classes and functions have a link to a list of notebooks in which that function/class is being used

Why and how?

Which pandas docs would benefit from this? All of them, I think, but to give an example: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc this documentation page is very good - and it even has quite a few examples of what is possible. But as I mention in the issue, it doesn't really answer when this method should be used (instead of pandas' many other indexing methods), and the examples are very small, illustrative examples of the various ways it can be used (which is great) but don't give much context or usecases for why the function would be needed.

rhshadrach commented 3 years ago

Thanks for the detailed examples! The main thing here is that all three packages have notebooks with specific examples/tutorials. I think pandas has something a bit different - the user guide. I'd guess that linking to places in the user guide where loc is used would come up with examples of questionable significance. I want to emphasize I'm guessing here - if one could come up with auto-generated examples and post them here, they'd be worth a look to see if they'd bring value.

The loc documentation you linked to does have a link to the user guide, linking to here, but it is not what I would call prominent. Perhaps that can be improved?