ossu / data-science

:bar_chart: Path to a free self-taught education in Data Science!
Other
18.57k stars 3.19k forks source link

RFC Add ISL as the cap of the statistics section of OSSU Data Science: #116

Closed Smcgb closed 4 months ago

Smcgb commented 6 months ago

Problem: OSSU DS curricula does not currently have a comprehensive course that takes the learned on-paper methods to programmatic assessment using well known programming languages and frameworks.

Background:

The absence of focused statistical learning courses in the OSSU Data Science curriculum results in an incomplete educational journey for students. Future courses are either highly specialized (Andrew Ng) or do not reintroduce learned statistical concepts into programmatic assessment.

The advantages include a more well-rounded curriculum, bridging critical gaps between basic and advanced concepts. However, potential disadvantages might include the need for curriculum restructuring and the additional time commitment for students as well as some overlap with highly specialized courses in the Data Mining section.

Proposal:

I'd like to recommend two courses for candidates in our Data Science Statistics program to either end the Statistics section or a new section created called Advanced Statistics with the current set being renamed to core statistics:

Statistical Learning with Python by Stanford University on EdX Statistical Learning by Stanford University on EdX

Both courses are based on the same content, differing only in the programming language used. They're aligned with a free book available at www.statlearning.com.

These courses offer an extensive introduction to statistical learning methods, crucial for anyone pursuing a career in data science. The authors are renowned figures in the data science community, and this book is frequently recommended on various Data Science, Machine Learning, and AI subreddits.

Why These Courses Are Beneficial:

Relevance to Data Science: These courses emphasize statistical learning, an essential skill for data analysis and interpretation. They serve as an excellent bridge from basic programming and statistics to advanced model building.

Curriculum Integration: They address gaps in the current curriculum with a focused approach to statistical learning techniques. It also bridges the gap from Statistics into data mining by building more of the fundamental approaches, programmatically, before specialized attention is given to it later with Andrew Ng's courses.

Expert Instruction: Taught by leading experts, these courses are acclaimed for their clarity and depth. Larry Wasserman, a respected Professor in Statistics and Machine Learning, endorses the course book.

Accessibility: Both courses are available for free on the EdX platform, and the book can be downloaded from the course website. Python labs can be found at this GitHub repository, and the course website provides direct files for both R and Python.

Framework Flexibility: The program offers a choice of frameworks including PyTorch, TensorFlow, Keras, etc.

Practical Application: The courses include hands-on exercises and real-world examples, ensuring practical understanding and application.

These courses are an invaluable resource for anyone aspiring to deeply understand and apply data science principles.

Alternatives: This set of courses could be an alternative to the Supervised Learning courses by Andrew Ng as well.

Pull Request https://github.com/ossu/data-science/pull/115/commits

waciumawanjohi commented 4 months ago

Seeing no objections, and with some support, marking this RFC as accepted.