Open waciumawanjohi opened 1 year ago
A big thank you to @reallyyy for reporting that the Descriptive Statistics course was no longer available, prompting this investigation.
For individuals that would like to get a head start on identifying a suitable Introduction to Statistics course, below is a list of resources that you may start with. Remember:
MIT OCW Statistics For Applications OpenStax Introductory Statistics Textbook Saylor.org Introduction to Statistics Stanford/Coursera Introduction to Statistics MIT/edX Fundamentals of Statistics Carnegie Mellon/OLI Probability & Statistics MIT OCW Introduction To Probability And Statistics Numerous youtube playlists
For individuals that would like to get a head start on identifying a suitable Theory of Statistics course, below is a list of resources that you may start with. The notes about analysis in the comment above apply here as well.
University of Arizona Theory of Statistics: Includes lectures and assignments, no solutions Stanford Stat 300A – Theory of Statistics: Includes handouts, HW with solutions, exams with solutions, no lectures or lecture notes Berkley Statistics 210A: Theoretical Statistics (Fall 2021) Lecture notes, HW without solutions. There is a Fall 2023 version underway. University of Minnesota Statistics 5101 Theory of Statistics I: Course for Students pursuing a BS (4101 is Theory of Stats I for students pursuing a BA) Course slides, HW and Exams without solutions, links to past course pages. MIT 9.520/6.860: Statistical Learning Theory and Applications Youtube lectures. No HW or exams. Course page
There's a conflation among the assertions put forward that descriptive statistics = "basic statistics" and therefore OSSU shouldn't spend the time on it because it's prerequisite material.
In short, no.
In long, noooooooooooooooooo.
Mean, median, and mode, stem & leaf, and scatterplots together represent the entirety of statistics encountered in high school. But this is Day 1 material in a university-level descriptive statistics course (though this is also encountered in probability, and therefore these courses are typically taught jointly as an introductory probability-and-statistics course).
After they spend roughly 60% of their time just cleaning their data, practicing data scientists spend roughly the next 20% of their time doing exploratory data analysis -- which leans heavily on descriptive statistics to characterize a dataset's distribution. The importance of mean, median and mode cannot be understated -- but other values like variance, IQR, mean absolute deviation, central moments, kurtosis, scedasticity, Kolmogorov–Smirnov test scores, etc. identify key descriptive signatures of a distribution.
No, we need a descriptive statistics course.
The OSSU data science curriculum goes up through multivariate calculus. I propose as a benchmark course Georgia Tech's ISYE 6739 (co-listed as ISYE 4739 for undergraduates). This combination probability/statistics course builds on a multivariate calculus foundation at a level appropriate for motivated undergraduates without prior exposure to probability or statistics. This is a rigorous yet effective combined probability/statistics course that does a good job of covering the basics to a point sufficient for further study, even graduate study. Prof. Goldsman really hits the Goldilocks Zone here -- none too esoteric, none too powderpuff. This course includes everything you need to set up further study in data analytics or operations research.
The importance of mean, median and mode cannot be understated -- but other values like variance, IQR, mean absolute deviation, central moments, kurtosis, scedasticity, Kolmogorov–Smirnov test scores, etc. identify key descriptive signatures of a distribution.
To be clear, the descriptive stats course did not cover the advanced topics you list. But you are correct that I conflated all descriptive stats with basic stats.
Assertion: OSSU Data Science curriculum should not recommend a basic stats course. This is prerequisite material; OSSU's focus is requisite material for undergraduate learners.
Candidate courses for an intro to stats RFC are now:
Edited for clarity
Hello Everyone,
I'd like to recommend two courses for candidates in our Data Science Statistics program:
Statistical Learning with Python by Stanford University on EdX Statistical Learning by Stanford University on EdX
Both courses are based on the same content, differing only in the programming language used. They're aligned with a free book available at www.statlearning.com.
These courses offer an extensive introduction to statistical learning methods, crucial for anyone pursuing a career in data science. The authors are renowned figures in the data science community, and this book is frequently recommended on various Data Science, Machine Learning, and AI subreddits.
Why These Courses Are Beneficial:
Relevance to Data Science: These courses emphasize statistical learning, an essential skill for data analysis and interpretation. They serve as an excellent bridge from basic programming and statistics to advanced model building.
Curriculum Integration: They address gaps in the current curriculum with a focused approach to statistical learning techniques.
Expert Instruction: Taught by leading experts, these courses are acclaimed for their clarity and depth. Larry Wasserman, a respected Professor in Statistics and Machine Learning, endorses the course book.
Accessibility: Both courses are available for free on the EdX platform, and the book can be downloaded from the course website. Python labs can be found at this GitHub repository, and the course website provides direct files for both R and Python.
Framework Flexibility: The program offers a choice of frameworks including PyTorch, TensorFlow, Keras, etc.
Practical Application: The courses include hands-on exercises and real-world examples, ensuring practical understanding and application.
These courses are an invaluable resource for anyone aspiring to deeply understand and apply data science principles.
@Smcgb The course describes itself as an "introductory-level course in supervised learning", so would follow an introduction to statistics.
Can you open a separate RFC to recommend the addition of this course to the curriculum? We'll leave the RFC open for 1 month for others to comment. The change looks like a positive one to me. After a month comment period we can include the course in the curriculum.
One optional edit that you can make to the RFC, is to link to some of the recommendations for the book that you mention.
Thanks for looking for ways to improve the curriculum!
Summary
OSSU should undertake a search for a number of new courses in statistics.
Background
OSSU currently recommends 2 courses on statistics:
The first of these is no longer offered.
Guidelines
OSSU Data Science uses the report Curriculum Guidelines for Undergraduate Programs in Data Science as our guide for course recommendation.
Section 6 "Transitioning To A Data Science Major Using Typical Existing Courses" states:
Subsection 6.3 "Courses in Statistics" states:
The GAISE College Report includes both goals, recommendations and suggestions for topics that might be omitted.
Goals (summarized)
Recommendations
These are largely recommendations for how statistics courses should be taught.
Suggestions for Topics that Might be Omitted from Introductory Statistics Courses
Of note, the basic statistics section reads:
There will be other RFCs for carrying out the individual steps (e.g. there will be a separate RFC for Identify an Introduction to Statistics course).