swcarpentry / python-novice-gapminder

Plotting and Programming in Python
http://swcarpentry.github.io/python-novice-gapminder/
Other
166 stars 432 forks source link

Proposal: Porting R gapminder plotting episode to python with plotnine #405

Open djinnome opened 5 years ago

djinnome commented 5 years ago

Hi folks,

As a software carpentry instructor, I have always been frustrated and disappointed with the current python gapminder plotting episode, especially when compared with the corresponding R gapminder plotting episode . Because Matplotib is so low-level and clunky, any plotting episode for novices that uses it will always end up with something unsophisticated and ugly.

Nowadays, there are so many high-level plotting libraries for python: bokeh, seaborn, altair, plotly, plotnine, etc. All these libraries use a grammar of graphics to make it easy to quickly obtain sophisticated publication quality charts. Teaching with any one of them would be a better learning experience than with matplotlib.

At the same time, there is a desire within the carpentries to align python and R lessons as much as possible. A solution that addresses both criteria is plotnine, which is an actively supported feature-for-feature port of R's ggplot2.

For this reason, the data ecology lesson now uses plotnine to teach students how to plot in python. Motivated by these arguments, I went ahead and ported the R gapminder plotting episode to python for the last software carpentry workshop I taught.

I realize that the current matplotlib plotting episode is fairly mature at this point, and there are some reasonable arguments for keeping it, but if you had been at the workshop, you would have seen how much fun we had learning plotnine. It generated by far the most positive feedback for any workshop I have taught, and I just want other students and instructors to experience the joy of a plotting system that actually isn't frustrating and disappointing.

Therefore, I would like to propose replacing the current matplotlib plotting episode with the plotnine plotting episode.

Sincerely,

Jeremy

davidrpugh commented 5 years ago

I agree that we should consider moving away from Matplotlib for the plotting episode but I am not sure about plotnine. Is there a strong desire within the community to align the R and Python lessons as much as possible? The syntax used by plotnine feels like I am writing R code in Python which I find a bit jarring.

davidrpugh commented 5 years ago

Over the summer I have been tasked with developing a Python-based Intoduction to Visualization for Data Scientists. Haven't started the work yet but have been going through

https://pyviz.org/index.html

looking for organizing principles and motivation.

djinnome commented 5 years ago

Hi @davidrpugh

I am glad that you share my desire to move away from the current Matplotlib episode and I cannot deny that I too initially found the syntax of plotnine a bit confusing. But my discomfort did not stem from its similarity to R (plotnine's syntax is actually quite pythonic), it stemmed from my struggle to grok a grammar of graphics. Once I was able to absorb its organizing principles and motivation, other Grammar of Graphics-based tools such as plotly, bokeh, seaborn, altair, became much easier for me to pick up.

Is there a strong desire within the community to align the R and Python lessons as much as possible?

Well to be fair, it has been two and a half years since I ported the data ecology plotting episode from R to python, and the carpentries community has grown a lot, but from what I recall of our discussions, the prevailing argument was that aligning the R and python lessons would reduce cognitive load for multi-language learners and would in general improve lesson maintainability long term. @gvwilson @rgaiacs @maxim-belkin @ntmoore what are your current thoughts on the desirability of porting the gapminder plotting episode from R to python?

Sincerely,

Jeremy

maxim-belkin commented 5 years ago

Unless there is a study on that subject, any answer to this question must begin with "I think". I think it really depends on the format of the lessons we're targeting at. If we're targeting at self-paced learners then (I think) yes, identical or similar content would make sense -- it will be easier to understand what is going on. If we're targeting workshop-style content delivery, then I think it will be a show-stopper (boring, etc). And, lastly, I don't think "cognitive load" argument is applicable here because we're talking about different lessons taught on different days. It's not like one has to understand R in order to learn Python (or vice a versa).

vahtras commented 5 years ago

Thanks for the initiative, I think :-) that matplotlib is the de facto standard for plotting in Python and without knowing R nor plotnine, I do not see any compelling reason for shifting to that framework. Streamlining two lessons on a similar topic will be a maintenance burden unless we use tools for doing that, such as having both lessons in common version control. I do not know of any desire for this direction, nor do I think it is where we should be going.

davidrpugh commented 5 years ago

I am personally moving away from using Matplotlib directly and towards Python plotting tools that support interactive and static visualization. I am also looking for a high level organizing framework for Python visualization that I can use in my teaching at KAUST. Best I have come across is...

https://pyviz.org/index.html

The best summary of the state of Python visualization ecosystem as well as near term trends is the following three blog posts.

https://www.anaconda.com/python-data-visualization-2018-why-so-many-libraries/ https://www.anaconda.com/python-data-visualization-2018-moving-toward-convergence/ https://www.anaconda.com/python-data-visualization-2018-where-do-we-go-from-here/

djinnome commented 5 years ago

Hi folks,

I agree with @maxim-belkin that the desirability of aligning the gapminder lessons between R and python does depend on the audience we are targeting. In my experience, the target audience can range from complete programming novices to experienced programmers who are coming from another language such as R, Fortan, or Matlab. Showing Matlab programmers the power of DataFrames is usually enough to persuade them. Jupyter is sufficient to persuade fortran programmers. But R programmers who are used to ggplot2 will rightly be disappointed and frustrated by matplotlib. And for students who are just learning to program? A quick comparison of the sophistication of the plots possible in a single episode of matplotlib vs plotnine should tell you everything you need to know about the suitability of matplotlib for novices.

@vahtras the python gapminder lesson is intended to teach students the power of modern python. This is why we have updated the lesson to use dataframes (instead of numpy) and jupyter lab (instead of notebooks or terminal). I am requesting we do the same for plotting. I agree with @davidrpugh that any grammar of graphics-based plotting system would be superior to matplotlib. Although bokeh or seaborn or altair would all be reasonable choices, I suggest plotnine because both R learners and R instructors are already familiar with this system and it was easy to port the R gapminder plotting episode to python.

Sincerely,

Jeremy

alee commented 1 year ago

I think altair / vega-lite might be a better target at this point:

https://altair-viz.github.io/

https://vega.github.io/vega-lite/