aayushsinha0706 commented 1 year ago

Problem: OSSU lacks other ML classes

Duration: 3 Feb 2023

Background: The earlier machine learning class by Prof. Andrew Ng and Stanford which used Octave on coursera was recreated by deeplearning.ai and now the class is using Python which is great but only problem is that it is now a coursera specialisation and the old link now only links to one course of whole specialisation.

Proposal: Just add other two courses in curriculum

Advanced Learning Algorithms

Unsupervised Learning, Recommenders, Reinforcement Learning

waciumawanjohi commented 1 year ago

We have other courses that are part of specializations where we don't recommend the entire specialization. For example:

How to Code is part of UBX's Software Development MicroMasters.
We recommend 3 of 4 courses in U Alberta's Software Design and Architecture Specialization.
We recommend 3 of 5 courses in Stanford's Database series.

We should include these courses if they cover material necessary in our curricular guidelines.

If they do not, it would be very appropriate to design an advanced track that included them. (We've talked about something similar in this RFC)

aayushsinha0706 commented 1 year ago

In reference to CS2013 Intelligent Systems Page 123 and Page 124

IS/Basic Machine Learning [2 Core-Tier2 hours] Topics: • Definition and examples of broad variety of machine learning tasks, including classification • Inductive learning • Simple statistical-based learning, such as Naive Bayesian Classifier, decision trees • The over-fitting problem • Measuring classifier accuracy

Learning Outcomes:

List the differences among the three main styles of learning: supervised, reinforcement, and unsupervised. [Familiarity]
Identify examples of classification tasks, including the available input features and output to be predicted. [Familiarity]
Explain the difference between inductive and deductive learning. [Familiarity]
Describe over-fitting in the context of a problem. [Familiarity]
Apply the simple statistical learning algorithm such as Naive Bayesian Classifier to a classification task and measure the classifier's accuracy. [Usage]

In the second course of the Machine Learning Specialization, we will learn: • Build and train a neural network with TensorFlow to perform multi-class classification • Apply best practices for machine learning development so that your models generalize to data and tasks in the real world • Build and use decision trees and tree ensemble methods, including random forests and boosted trees

In the third course of the Machine Learning Specialization, we will learn:

• Use unsupervised learning techniques for unsupervised learning: including clustering and anomaly detection. • Build recommender systems with a collaborative filtering approach and a content-based deep learning method. • Build a deep reinforcement learning model.

The current offering only talks about supervised learning algorithms and not other algorithms like decision trees (taught in second course, algorithms under unsupervised learning and reinforcement learning (taught in third)

We have other courses that are part of specializations where we don't recommend the entire specialization. For example: How to Code is part of UBX's Software Development MicroMasters. We recommend 3 of 4 courses in U Alberta's Software Design and Architecture Specialization. We recommend 3 of 5 courses in Stanford's Database series.

Also, the reason behind I am recommending all three courses from specialisation is that the earlier version included all the learning into a single 10 week class but now is broken down into three courses of 3, 4, 3 weeks respectively.

Also learning the knowledge of above two course is necessary if we ever include Advanced AI RFC in our curriculum

waciumawanjohi commented 1 year ago

the reason behind I am recommending all three courses from specialisation is that the earlier version included all the learning into a single 10 week class but now is broken down into three courses of 3, 4, 3 weeks respectively.

A fair point.

It's not clear that the guidelines suggest including the two later courses. Of the 5 learning outcomes, only 1 is about being able to implement. Four are about being familiar with some of the big ideas in the field.

So presumably we could meet students needs and reduce the length of the curriculum by identifying a resource that is shorter and less involved and pair it with the current recommendation.

To be clear, my concern is driven from one of the most consistent critiques of OSSU, that the recommended path is too long. The counterbalance to that critique in this case is that Ng's Machine Learning course is one of the classic MOOCs, with high ratings going back a decade.

Should we recommend a deeper and longer than necessary dive with a set of better-than-most courses? Or identify a resource that gives an appropriate overview and just the recommended deep dive into one aspect of ML?

Alaharon123 commented 1 year ago

just wanna link the alpha version of CS2023 on this topic. As one would expect in a world where machine learning is more important than it was a decade ago, it has more core hours than CS2013 does, and different focus and such https://csed.acm.org/wp-content/uploads/2022/07/AI_Version_Alpha.pdf

waciumawanjohi commented 1 year ago

Good context. Pulling out the most relevant sections:

AI/Basic Machine Learning

Topics

CS Core

Definition and examples of a broad variety of machine learning tasks
- Supervised learning
- Classification
- Regression
- Reinforcement learning
- Unsupervised learning
- Clustering
Simple statistical-based supervised learning such as Naive Bayes, Decision trees
The overfitting problem and controlling solution complexity (regularization, pruning)
- The bias (underfitting) - variance (overfitting) tradeoff
Working with Data
- Data preprocessing
- Importance and pitfalls of
- Handling missing values (imputing, flag-as-missing)
- Implications of imputing vs flag-as-missing
- Encoding categorical variables, encoding real-valued data
- Normalization/standardization
- Emphasis on real data, not textbook examples
Representations
- Simple basis feature expansion, such as squaring univariate features
- Learned feature representations
Machine learning evaluation
- Measuring classifier accuracy
- Separation of train, validation, and test sets
- Estimation of test performance, using held-out data
- Tuning the parameters of a machine learning model on held-out validation data
- Importance of understanding what your model is actually doing, where its pitfalls/shortcomings are, and the implications of its decisions
Basic neural networks
- Fundamentals of understanding how neural networks work and their training process, without details of the calculations
  KA Core
Formulation of simple machine learning as an optimization problem, such as least squares linear regression or logistic regression
- Objective function
- Gradient descent
- Regularization to avoid overfitting
Ensembles of models
- Simple weighted majority combination
Deep learning
Deep feed-forward networks (intuition only, no math)
Convolutional neural networks (intuition only, no math)
Visualization of learned feature representations from deep nets
Performance evaluation
Other Metrics (e.g., error, precision, recall)
Confusion matrix
Cross-validation
- Parameter tuning (grid/random search, via cross-validation)
Overview of reinforcement learning
Two or more applications of machine learning algorithms
E.g., medicine and health, economics, vision, natural language, robotics, game play
Ethics for Machine Learning
Learning Outcomes
1. Describe the differences among the three main styles of learning: supervised, reinforcement, and unsupervised.
2. Differentiate the terms of AI, machine learning, and deep learning.
3. Frame an application as a classification problem, including the available input features and output to be predicted (e.g., identifying alphabetic characters from pixel grid input).
4. Apply two or more simple statistical learning algorithms (such as k-nearest-neighbors and logistic regression) to a classification task and measure the classifiers’ accuracy.
5. Identify over-fitting in the context of a problem and learning curves and describe solutions to overfitting.
6. Explain how machine learning works as an optimization/search process.
7. Describe the neural network training process and resulting learned representations
8. Explain proper ML evaluation procedures, including the differences between training and testing performance, and what can go wrong with the evaluation process leading to inaccurate reporting of ML performance.
9. Implement and compare two machine learning algorithms on a dataset, preprocessing it from scratch.

Context

Elsewhere it is explained that: There will be two types of core concepts:

CS Core: concepts that every Computer Science graduate must know.
KA Core: concepts that any coverage of this KA must include.

waciumawanjohi commented 1 year ago

I still think that we are presented with the question, should we recommend a course that dives deeper than necessary because the course is very well reviewed?

An example of that 'diving deeper than necessary': the standards say students should know "Fundamentals of understanding how neural networks work and their training process, without details of the calculations". But in Advanced Learning Algorithms students "dive deeper by learning how to code up your own neural network in Python, 'from scratch'."

aayushsinha0706 commented 1 year ago

As stated in previous RFC #1111 creating 100% curriculum on basis of CS 2013 is not possible unless we start creating our own material

But then we just cannot cut the material as students will often come out with less maturity on a particular subject and that is why we can avoid such shortcoming by sometimes giving that dives deeper than necessary.

( Note : I know its an open source computer science program and not an actual degree and many people are using it as a secondary source of learning or supplemental learning like me, but there are people who are also using it as a primary source of learning to get into industry and avoid paying hefty fees to universities or bootcamps.)

There is also particular case of Mathematics with AI/ML classes, in OSSU we have very light usage of mathematics as compared to universities. The hardest class is Math for CS(Discrete Math) that is in core cs. Actual CS Programs have even harder math classes like differential equations , numerical analysis, and not to mention Statistics etc.

There are also courses like Intro to AI by Berkeley that I think will cover material in better manner than as in a proper ratio of breadth and depth in subject as compared to Andrew Ng's course, but why I didn't recommended it is because of mathematics requirements, I would also like to quote spamegg1 from # linear algebra discord

" Self-learning has some very hard limitations, we often don't think about this. Just because some university dumped a course (which is normally taught in school) and its content online doesn't mean it's "doable" all on your own."

And that I think applies here, as many ML/AI classes present online have heavy usage of mathematics, but with Andrew Ng's course is the only ML course that I can think of where instructor says you don't require heavy knowledge of mathematics. The course bridges the gap of less math maturity and Machine Learning Knowledge.

waciumawanjohi commented 1 year ago

@aayushsinha0706, would a fair summary of your previous post be:

If we don't add the two additional Ng courses, we will need to identify a course to offer. When searching for such a course, we should take special note of the prerequisites. Finding a course that does not rely on courses beyond Core Math would be necessary and difficult.

If so, fair points!

aayushsinha0706 commented 1 year ago

In that summary I would also like to (this is purely my view)

Cutting down the material is not the solution.

Also in CS2013 there are other AI courses as well that don't have any included electives like IS/Basic Search Strategies and IS/Basic Knowledge Representation and Reasoning. But we don't have these courses because of the math requirement. (reference to CS2013 page 121)

And hence if we are cutting down material somewhere we need to provide somewhere as well.. Just cutting down the material all in all will only hamper student maturity on a particular subject who purely rely on OSSU for CS education.

ghost commented 1 year ago

Perhaps some of these things can be considered electives, and the limitation should be a number of classes or "credit hours" or whatever. Then the student would have to choose which electives to add to keep it within a certain number of credit hours for their electives. Or maybe alternative paths could be given that still hit all of the necessary areas, but have certain areas highlighted more. These would have the area addressed more from the beginning to allow for this to be the case.

dvirberlo commented 1 year ago

I don't know if I have the large perspective of the others that are discussing this issue. But I did finish the 3 courses. I used the app sometimes, and it was obvious to me I should do all the 3 parts of the specialization.

I did feel that I could have had a deeper understanding if I knew more about matrixes, statistics etc. But I did it while I was in the middle of Single Variable Calculus (OCW), and I think I understood the concepts well.

Obviously, if advanced ML (#1013) will be added, I would not think to go this path without going through the required math courses first.

Anyway, I think it is not good to leave the curriculum docs in this ambiguous state. With the justification from the fact that "this is how it was" I would suggest to add in the "Notes" section that the curriculum's intention is to all 3 courses.

bradleygrant commented 1 year ago

Worth noting that, prior to the 2022 refresh of this class, it was a single course. Now, the course is broken into three "parts". These are not three discrete courses. They're chapters -- supervised methods, unsupervised methods, and [another one].

I suspect this decision was made to gamify the course and make people feel good about making it to the 1/3rd point of the course.

We have made the mistake of only allowing one of the "parts" into the curriculum. Just to maintain parity with where the curriculum was a year ago, all three parts should be there.

waciumawanjohi commented 1 year ago

We've gone over the comment period and I would characterize most comments to support the RFC. Adopting the RFC and closing the issue. Thanks to all who have participated!

ossu / computer-science

RFC: Add rest of Andrew Ng new ML Class #1118

AI/Basic Machine Learning

Topics

CS Core

KA Core

Learning Outcomes

Context