Open coderschoolreview opened 6 years ago
A couple of extra comments for assignment 1:
k_range = range(1, 20, 1)
)Assignment 2
The goal of this assignment was to introduce you to three new classification techniques and to understand how to select the best parameters and features for them. You learned how to use python built-in functions (GridSearchCV, SelectKBest, RFE, SelectFromModel) to try out new models (Support Vector Machines, Random Forests, and Logistic Regression) and test different permutations of parameter values and features, and analyze your results to help build better machine learning models.
EXCELLENT job! You are clearly very comfortable with programming and Python.
Great work on the following:
moods
as features in addition to just audio, and you saw that your scores really increased. Well done!Pipeline
object! Great work combining it to vary both SelectKBest
and LogisticRegression
values.Some suggestions:
Pipeline
on even more combinations, egs: to vary both RandomForestClassifier
parameter values along with SelectFromModel
values. Here's some examples:
ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
You can try fixing this by increasing the max_iter
parameter in GridSearchCV
(the default is 100).GridSearchCV
?Overall, amazing work. Keep it up !!!
Assignment 3
The goal of this assignment was to introduce you to three new Natural Language Processing techniques, and to understand how to perform some basic sentiment analysis on song lyrics using these methods. You learned how to clean and prepare textual information for NLP, and then apply the following approaches: Bag Of Words, TF-IDF, and Doc2Vec. You used your prior knowledge of Python estimators, feature selection, and parameter optimization techniques to produce feature vectors from these NLP methods to make predictions on the moods of songs using their lyrics.
Amazing work. You do a great job on all the assignments and your submissions are always impressive.
What you did well:
Some notes:
You plotted accuracy for the unbalanced dataset as well, and you used accuracy_score
to measure this. However, this is a bit misleading -- you generally don't want to use accuracy_score
when your data is unbalanced. It's better to use an f1-score or the AUC. Check the following links for explanations on this:
Looks like you got a ConvergenceWarning
for GridSearchCV
on your LogisticRegression
classifier; try bumping up max_iter
even higher than 1000 and check if it works. However, note that in some cases, it is also possible that your data simply can't be fit by a logistic model.
Overall, excellent work!!
The goal of this assignment was to introduce you to 2 main concepts in Machine Learning: Data Pre-processing, and Classification. You learned how to query and clean data using the pandas library in Python, and built a simple Machine Learning Classifier based on the K Nearest Neighbors algorithm.
Looks like you know python quite well - your code could be cleaner, though. Your final submission has lots of commented code (and no actual comments) so it's a little difficult to follow.
Good job on completing the main requirements of the assignment. In future weeks help the grader understand better comments or documentation sections in your code will be helpful. But great job with the plot and overall results.