moderndive / ModernDive_book

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
https://www.moderndive.com/
Other
757 stars 485 forks source link

Consider elevating the statistical background appendix to a short chapter #80

Closed gdbassett closed 5 years ago

gdbassett commented 5 years ago

Filed per https://twitter.com/ModernDive/status/1073340286091386881. May I suggest elevating the statistical background section to a short chapter.

Currently there are no tidy, bayesian, R-based introduction to statistics books that I am aware of. As such, initial statistics must be taught from a more traditional book (e.g. https://www.amazon.com/Introductory-Statistics-R-Computing/dp/0387790535/). It is unlikely an instructor would start with one such book for the basic first section and then transition to a more modern approach for future topics (confidence, visualization, modeling, hypothesis testing, etc).

Modern Dive could help solve this by adding an introductory chapter that expands the Statistical Background appendix. It can probably be less than the equivalent chapters in other stats books since, as was said in the tweet, the concepts will hopefully be interspersed within other sections to allow learning as doing. (I would recommend reviewing the other sections to ensure they do cover using the mean, median, mode, quantiles, SD, variance, and several common distributions such as normal/guassian, bernoulli, beta, biomial, uniform, geometric, poisson, gamma, log normal, exponential, and general power-law distributions).

I would suggest the goal is not to teach students when and how to use these concepts (as hopefully the rest of the book takes care of that), but provide context so that when they see them in use they understand how they fit into statistics as a whole. (For example, https://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/ gives an interesting quick explanation of basic distributions and their relationships.)

To that end, it may even be beneficial to mention common traditional statistics (p-value, t-test, etc) in this section and then point to the Appendix where they are explained, not necessarily to give students an alternative to the primary approaches taught, but simply so they understand where these things they will hear commonly sit in the context of what they have learned.

And thank you for what is ultimately the go-to reference for a tidy approach to statistics. I think it's sorely needed and an excellent book with or without modification. I look forward to buying a hard copy as soon as they come off the presses!

rudeboybert commented 5 years ago

Hi @gdbassett, thanks for the note and the kind words!

I will say however tho, that it is highly unlikely we'll be incorporating anything relating to probability distributions, other than the normal, as it is was an editorial decision on our part when we first started this project to leave out as much probability theory as we could. We're not saying these aren't important tools in a data scientist's toolbox, but rather we being extra judicious about limiting topic bloat/keeping the scope of the book tight. We truly believe that if we slowly try to include everything, soon we'd end up with nothing. However, your comment on making more evident: lessons on summary stats (mean/median/mode, SD, variance) and statistical inference definitions, perhaps to the Appendix we've been discussion, in definitely something worth considering.

@ismayc and I will take this all into consideration while we update the book for the next version bump, probably sometime in mid-to-late January. Thanks again!

rudeboybert commented 5 years ago

Hi @gdbassett, here's an update. We're very close to putting out the version of moderndive that will correspond to our print edition with CRC Press. We incorporated some of your suggestions:

While the appendix can definitely still be improved on, as a first pass we think it suffices. Many thanks for your feedback!

gdbassett commented 5 years ago

Thank you for the consideration. I'm looking forward to buying a hard copy when it comes out!