swcarpentry / python-novice-inflammation

Programming with Python
http://swcarpentry.github.io/python-novice-inflammation/
Other
299 stars 779 forks source link

Episode 03 - Visualizing Tabular Data : Suggestions #835

Open xintin opened 4 years ago

xintin commented 4 years ago

Hi, while going through Episode3, Visualizing Tabular Data, I came across a few things that I feel can be improved/added.

  1. It is a globally accepted convention to use an alias for mathplot.pyplot as plt. Can we replace import matplotlib.pyplot with import matplotlib.pyplot as plt Ref: http://google.github.io/styleguide/pyguide.html#22-imports

  2. In the introduction, we are saying that,

    First, we will import the pyplot module from matplotlib and use two of its functions to create and display a heat map of our data

    Can we explain the usage of imshow and show()? Reason being nowhere else in the lesson we are using imshow(). And it leaves me without a clear explanation of why do I need imshow(data) along with show().

  3. matplotlib is used to plot bar charts, pie charts, histograms, scatter plots, etc. Shall we make the tutorial more varied with examples apart from line charts alone? We can also include this as an exercise.

  4. In this episode, we are only using matplotlib. Shall we rename the episode as "Visualizing Tabular Data using Matplotlib"? Because we do have seaborn, plotly, and ggplot like packages gaining popularity too.

  5. How about adding a small snippet using set_title() to distinguish sub-plots?

Kindly share your thoughts on the above points.

Thank you.

ldko commented 4 years ago

Hi @xintin , Thank you for providing these carefully considered suggestions to improve the Visualizing Tabular Data episode! Here are my responses to your separate points:

  1. It is a globally accepted convention to use an alias for mathplot.pyplot as plt. Can we replace import matplotlib.pyplot with import matplotlib.pyplot as plt Ref: http://google.github.io/styleguide/pyguide.html#22-imports

This is something that has come up multiple times. For the time being we are not making this change. Please see issue #830 for reasoning.

  1. In the introduction, we are saying that, First, we will import the pyplot module from matplotlib and use two of its functions to create and display a heat map of our data Can we explain the usage of imshow and show()? Reason being nowhere else in the lesson we are using imshow(). And it leaves me without a clear explanation of why do I need imshow(data) along with show().

Yes, I think it would be helpful to briefly indicate what imshow and show do in no more than 1-2 sentences. Perhaps it could fit in before the sentence that starts "Blue pixels in this heat map represent". If you are willing to add this, please open a PR to add this text.

  1. matplotlib is used to plot bar charts, pie charts, histograms, scatter plots, etc. Shall we make the tutorial more varied with examples apart from line charts alone? We can also include this as an exercise.

I think the time it would take to add more examples of different types of visualizations to the main episode body is prohibitive. I think seeing more of these visualization types would be of interest to learners though, so I think including some of them through exercise(s) that use the inflammation data would be worthwhile. That would help facilitate instructors bringing in more examples when they want to focus on the visualization but skip over them when there is greater need to move to other concepts in the lesson. If you would like to submit such examples, please create one PR per exercise you would like to see included.

  1. In this episode, we are only using matplotlib. Shall we rename the episode as "Visualizing Tabular Data using Matplotlib"? Because we do have seaborn, plotly, and ggplot like packages gaining popularity too.

We do not tend to be that specific in the episode titles. I think the idea here may be to focus on the objective of visualizing data, not the specific libraries we will use to get the job done. We do mention matplotlib in Key Points and as the file name for the episode though. I am interested to hear others' opinions on specifying "Matplotlib" in the episode title.

  1. How about adding a small snippet using set_title() to distinguish sub-plots?

I think people would find this useful. It could be added to the existing code under the Grouping plots heading to add titles to each of the three plots. If you would like, please open a PR specifically for adding titles to the plots.

xintin commented 4 years ago

Hi @ldko, Thanks for your thoughts. I agree with the reasoning for the 1. above. For the rest, I would probably try to submit the feasible PRs over the weekend. In the meantime, if someone wants to contribute, please feel free to do the needful.