[Doc]: Improve new user guide sections

matplotlib / matplotlib

matplotlib: plotting with Python

https://matplotlib.org/stable/

20.35k stars 7.66k forks source link

[Doc]: Improve new user guide sections #26366

Open jklymak opened 1 year ago

jklymak commented 1 year ago

Documentation Link

https://matplotlib.org/devdocs/users/explain/index.html

Problem

https://matplotlib.org/devdocs/users/explain/index.html has seen significant new material added. In particular

https://matplotlib.org/devdocs/users/explain/axes/axes_intro.html
https://matplotlib.org/devdocs/users/explain/artists/artist_intro.html
the new material in https://github.com/matplotlib/matplotlib/pull/26279 (not on devdocs as of this Issue).

I'd expand this list to the quick-start guide:

https://matplotlib.org/devdocs/users/explain/quick_start.html

which I think could use some revision in light of where it now stands in the docs (some things could be removed and reference other parts of the new User Guide)

I wrote these, and they had minimal input from others. I think they are necessary as placeholders and not embarrassing in their content. However, they could benefit from other input now that they are in place, and follow-on PRs (or issues if they are focused) are most welcome. Indeed I'll try and stay out of reviewing any changes, unless I really feel things are going off the rails. If we are talking further reorganization, I would like to be part of the conversation, and I'd encourage discussion before anyone puts a lot of work into that.

I don't think improving these is urgent, and can happen in point releases.

EDIT: Another important suggestions is more cross linking to the Examples.

jklymak commented 1 year ago

I will further comment that some of the criticism of the new material is that it is too curt, and has hard-to-follow loop constructs etc. Agreed! and some of this was just expediency of adapting an existing Example.

However, I think there is a discussion to be had about the level that the docs are pitched at, and what level of assumed knowledge each page can make. We should map that out, but I strongly feel every page should not be re-teaching every concept, or the docs will be completely unreadable.

I also think there should be some consensus about what we think the right mental model for Matplotlib is. People who love the Grammar of Graphics declarative paradigms are never going to be comfortable with Matplotlib's imperative paradigm. Outside of the one page where we discuss this, I don't think we should ever discuss it. We should stick to our lane and the mental model that has served Matlab and Matplotlib well for many decades now.

Maybe for a special dev call.

story645 commented 1 year ago

Top line is my issues are with content more than organization.

I wrote these, and they had minimal input from others.

Every time I left a detailed review, I was told that it was out of scope and I should implement it myself. I don't think #26279 should have been merged b/c I think it makes the documentation problem worse and not better because now we have the same content framed the same way in two places rather than scoping for different audiences. I link new users to the docs all the time and https://matplotlib.org/devdocs/users/explain/axes/axes_scales.html#axis-scales isn't an improvement to linking https://matplotlib.org/stable/gallery/ticks/index.html#ticks

We should map that out, but I strongly feel every page should not be re-teaching every concept, or the docs will be completely unreadable.

The very consistent feedback we've gotten on our docs is that they're not very helpful to folks who don't already know how to use the library, so I think most of the time we're not teaching the concepts in the first place.

Reiterating https://github.com/matplotlib/matplotlib/pull/26279#issuecomment-1634675103, I don't think demos are teaching. I absolutely agree that we should have a one source of truth teaching doc, but a teaching concepts of matplotlib doc steps through and explains what's going on:

We create 3 plots to illustrate the three ways we support setting ticks. 
In this example we use the y axis, but we have equivalent methods for the x and z axis. 

* ax1: automatic 
* ax2: manually using `set_yticks` and `set_yticklabels`
* ax3: using `Locator` and `Formatter` objects:

We plot the same diagonal line going from 0 to 100 on all three::

 fig, (ax1, ax2, ax3) = plt.subplots(3) 
 for ax, label in [(ax1, "Automatic"), (ax2, "Manual"), (ax3, "Object Oriented")]:
   ax.plot( np.arange(100), np.arange)
   ax.set_title(f'{label} ticks')

On *ax2* we will manually set ticks at every 100/3 mark using the `set_ticks` method::

 ax2.set_yticks(np.arange(0, 100.1, 100/3))

Then we manually set the labels for these ticks using the `set_ticklabels` method::

 ax2.set_yticks(np.arange(0, 100.1, 100/30), minor=True)

On *ax3* we use `Locator` and `Formatter` objects to customize where ticks are placed. 
We have many, see (ref or doc) for details, here we show one to illustrate how they are generally used. 

On *ax3* we use the `MultipleLocator` to place a tick at every 100/3 mark on the major axis::

 ax3.yaxis.set_major_locator(ticker.MultipleLocator(100/3))

On *ax3* we use a string formatter to append the unit to every tick mark::

 ax3.yaxis.set_major_formatter('{x} steps')

while demoing formatters looks like:

https://matplotlib.org/stable/gallery/ticks/index.html#ticks

And we need both. We get a ton of requests for the step through everything please docs, but they want it in the gallery where that's completely unmaintainable. Where it would be appropriate though would be the user guide, where we could then link out to as needed.

story645 commented 1 year ago

also think there should be some consensus about what we think the right mental model for Matplotlib is. People who love the Grammar of Graphics declarative paradigms are never going to be comfortable with Matplotlib's imperative paradigm. Outside of the one page where we discuss this, I don't think we should ever discuss it. We should stick to our lane and the mental model that has served Matlab and Matplotlib well for many decades now.

I think this is a strawman, 'cause yes Matplotlib isn't Grammer of Graphics or Matlab and both are the wrong mental model- GoG 'cause declarative, Matlab b/c we don't actually want folks using the pyplot API. But, the docs make nods to the audience coming from Matlab so that they're oriented to where Matplotlib diverges and what Matplotlib calls things. GoG uses the conventions common in the visualization community, and so there's a generation of folks who are learning visualization that way. I think it serves us well to be inclusive of that audience by saying "hey, this is what we call thing x that you may know as y"

jklymak commented 1 year ago

I think it makes the documentation problem worse and not better because now we have the same content framed the same way in two places rather than scoping for different audiences. I link new users to the docs all the time and https://matplotlib.org/devdocs/users/explain/axes/axes_scales.html#axis-scales isn't an improvement to linking https://matplotlib.org/stable/gallery/ticks/index.html#ticks

If you meant https://matplotlib.org/devdocs/gallery/scales/index.html (rather than ticks), then agreed that the material is very similar.

The important improvement in https://github.com/matplotlib/matplotlib/pull/26279, that I feel merit it being merged, is that now "scales" appears in the TOC of the User Guide with the rest of the information about Axes, at https://matplotlib.org/devdocs/users/explain/axes/index.html. It would be a strange user guide without this information.

As noted here, the quality of the new material, or how it is pitched, or how much it overlaps with the existing examples, can all be reworked. Please consider what was merged a placeholder.

tacaswell commented 1 year ago

The docs are unequivocally better now with #26279 merged (which I obviously think because I merged it). Our convention on docs PRs is "better than it was" and we do not do copy-editing via PR because that is soul crushing. The response to anyone who says "you need to re-write this section because I do not like the style" (rather than "that text is factually wrong") is going to be a suggestion that they follow up with another PR.

Please to not confuse "not how I would write that" with "wrong".

story645 commented 1 year ago

"you need to re-write this section because I do not like the style"

It's not style, it's content - I get told all the time to rewrite my doc PRs b/c a reviewer feels what I wrote isn't appropriate for that section of the docs.

Eta: to elaborate, we tell contributors all the time that something would be more appropriate as an example or as a section in the tutorials or it belongs in API docs b/c these sections are written differently and have different criteria. We haven't formally codified that yet for user guide but "I think this content isn't appropriate for the user guide b/c it doesn't teach (in a way that's different than how we already teach this elsewhere)" is equivalent to rejecting code because the reviewer believes that the way it's coded doesn't solve the problem - which is what we do do all the time.

story645 commented 1 year ago

Please consider what was merged a placeholder.

I really appreciate you saying this, but my concern is that because the documentation is there, folks, especially newer contributors, will feel that it's not their place to rewrite or rework it. Even I kinda feel I'm being disrespectful, and I'm coming at it from being the community manager, having taught Matplotlib and visualization for years to a really wide variety of folks, and mentoring the GSOD.

I think one of the overarching issues with the docs is that "does it make it better" is usually true on each individual page where the information is added, but that new material tends to be added without evaluating how it fits into the overall docs, and what we end up with is doc sprawl. I think defining some content guidelines for each section of the docs-Gallery, Tutorials, User Guide- could really help with navigability and also possibly help our documentation PR process by providing guidance on what content we think belongs in each section. @esibinga is doing this work for the gallery GSOD, and I want to propose content guidelines here for the new contributors guide:

Purpose Teach folks how to use Matplotlib Fairly consistent feedback we get about the docs is that folks who already know how to use Matplotlib generally like the docs and find them really useful, while folks who do not know how to use the library don't feel like they can learn how to from the docs.

Scope Matplotlib specific concepts/constructions In explaining Matplotlib specific constructions, we should weave in what they're called in the visualization and stats communities so that contributors who do not know Matplotlib can make the connections/get that grounding. I hear the concern that we don't want to be writing a visualization or statistical methods textbook and I agree - which concepts we introduce should be driven by the Matplotlib object we're explaining.

Approach: scaffolded isolation of concept Generally the tutorials that got moved into user guide like annotation and custom colormaps follow a similar format of breaking up the material into very small chunks, generally one keyword at a time. This isolation of each concept is intended to make it easier for folks to learn how each works, and then how to combine these parameters together to generate more complex visuals.

jklymak commented 1 year ago

I largely agree with the above from a tactical point of view.

Fairly consistent feedback we get about the docs

How are we gathering this feedback?

while folks who do not know how to use the library don't feel like they can learn how to from the docs.

I strongly feel the biggest problem with the docs was the lack of an actual Users Guide. The collection of examples, tutorials, and the few things that are in the current "explain" part of the "user guide" do nothing to guide new users. (eg https://matplotlib.org/3.7.2/users/explain/index.html is a mystifying hodgepodge). The best we had were the tutorials, going from Basic, Intermediate, and Advanced, but even those were lacking context and a narrative arc. There was the Quick Start Guide, but that was all we had to orient people.

The attempt with the reorganization PRs has been to reorganize much of the material in the tutorials into a narrative arc that we can actually call a Users Guide (compare https://matplotlib.org/devdocs/users/explain/index.html to the above). The goal of these has been to start from major topics and focus on details either deeper in the intros or in subsidiary pages. I think the things 95% of the people do 95% of the time should be covered relatively clearly; I think some of the more advanced things can be covered in subsidiary pages or in the API docs. There should be an attempt for this Guide to be as comprehensive as possible, but it need not cover every option of every feature in the library.

I look forward to seeing how the new User Guide evolves over the next few months and years. I think it will be more helpful to our users than what we have now, and it can continue to evolve as we get feedback.

story645 commented 1 year ago

How are we gathering this feedback

The most recent formal method was the user survey we did for the last GSoD, which @paniterka took another look at for this year's GSoD. I would like it if we could run a new one after the user guide settles to see if changes were successful.

I strongly feel the biggest problem with the docs was the lack of an actual Users Guide

I don't disagree exactly - one of my motivations for the plot types gallery and the cheat sheets was for more entry level heavily curated overview type content. I just would have preferred tackling the user guide after we get some consensus on what belongs where and agreeing on content guidelines.

even those were lacking context and a narrative arc

So I think fundementally my issue is that I'm not sure we get context and narrative arc just because we have a cleaner outline via TOC. Like if it was just that, the ToC could have been a series of links out to the respective section of the gallery. I worry that going between parts of the guide that are inconsistent in how they present information could disorient the reader once they get to a point where they don't understand the information - I think we want confused readers to suggest ways we can improve our docs, not think they're incapable of using the library. (ETA: basically I worry they'll blameselves for issues arising due to the docs being incomplete or unclear b/c it's harder to see those issues in the docs when they're a mix of approaches/not cohesive)

and it can continue to evolve as we get feedback.

My worry is that w/o agreed upon content guidelines, this evolution will land us right back in this disorganization.

jklymak commented 1 year ago

My worry is that w/o agreed upon content guidelines, this evolution will land us right back in this disorganization.

Sure, maybe. On the other hand, we have been talking about doc overhauls for years. I think its better to start with something imperfect rather than suffer analysis paralysis.

We have made concrete steps to move the docs forward and improve the organization. A couple years ago we could not move pages because links would break, and we had half the docs in tutorials and the other half in rst, disjoint sections of the docs with no way to merge the two.

Now, however, we have made sphinx-redirect (#19456), and we made the changes to sphinx gallery (https://github.com/sphinx-gallery/sphinx-gallery/pull/1071) to remove those restrictions. Now we have the tools to make a re-org possible, and I have made a first cut at a re-organization using those tools.

So I think fundementally my issue is that I'm not sure we get context and narrative arc just because we have a cleaner outline via TOC.

No, we definitely don't, but it's a first step. This re-org has also added concrete new introductory material to fill in gaps and they attempt to add narrative arc:

To me, the concrete steps forward are to continue improving these pages, making the old pages that were moved more consistent, and adding new pages into the structure.

I don't think the steps taken so far are inconsistent with further writing "agreed upon content guidelines", having a doc summit, hiring an information architect, or any other aspirational way forward. I don't think waiting for those things to happen should block further improvements.

story645 commented 1 year ago

To me, the concrete steps forward are to continue improving these pages, making the old pages that were moved more consistent, and adding new pages into the structure.

How do we make things more consistent without agreeing on what consistent means? There are like 4 different approaches going on in those docs.

"agreed upon content guidelines", having a doc summit, hiring an information architect, or any other aspirational way forward

These are lumping multiple very disparate things together. GSoD was a first pass at information architecture and I've been very clear about that and it wraps up in November. I'm not particularly interested in a doc summit either b/c the agreed upon next step at the last one was that our issue wasn't lack of content, it was organization and we should hire an architect and that went nowhere.

I think we absolutely need content guidelines. In https://github.com/matplotlib/matplotlib/issues/26196#issuecomment-1609934010 you also said that one of the major issues with the doc review process is that everyone has their own idea of what the docs should read like. Content guidelines provide guidelines on what's fair game for a review and which reviews need to be addressed and they even the playing field for newer contributors by letting them know what we want rather then them having to guess the style. I can write the user guide content guidelines as policy PR that I put into our "writing docs" section and I'm very willing to see that through, but we need consensus that we'd actually follow it and its remotely enforceable.

Also frankly, the current system incentivizes me at least to just skip the review and go straight to putting in the competing rewrite PR and like I hate how not collaborative that is.

I think its better to start with something imperfect rather than suffer analysis paralysis.

ETA: I'm not against starting w/ something imperfect, I'm against starting something imperfect w/o any real agreement on how it takes shape. This could have been discussed in a meeting, gotten consensus, and then implemented.