Open coderschoolreview opened 5 years ago
The goal of this assignment is for you to learn to build and evaluate a model using scikit-learn library in Python. You learn how to do Data pre-processing, splitting it into training and test sets, training your model using training sets and evaluating its performance on test sets.
Things you did well:
Some minor tips:
Hi Hung Tuan,
Excellent work on your Assignment 3. You seem confident with performing tasks that include sentiment analysis.
Keep it up!
The goal of this assignment was to introduce you following concepts in Machine Learning:
!pip install missingno
, more on magic command heremissingno
to visualizepandas.get_dummies
To know "which is the best combination of parameter":
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
# Using gridsearchcv, random forest model and this param grid to find the best combination of parameters
# Hint: example
# https://stackoverflow.com/questions/30102973/how-to-get-best-estimator-on-gridsearchcv-random-forest-classifier-scikit
param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}
from sklearn.model_selection import GridSearchCV
gcv = GridSearchCV(RandomForestClassifier(),param_grid=param_grid)
gcv.fit(X,y)
gcv.best_params_
Installing packages missingno
(or any arbitrary package) on Win10:
Anaconda Prompt
conda install -c conda-forge missingno
Filtering series:
Your code:train_copy.isnull()
Could write like this to get column with null value only:
ncols = train_copy.isnull().sum
ncols[ncols!=0]
Check if whole data frame have any null value:
train.isnull().any().any()
For evaluation function, should print
instead of return
so when you loop through list of model and evaluate them, the result for each iteration printed to output.
The goal of this assignment was to introduce you following concepts in Machine Learning:
missingno
to visualizepandas.get_dummies
Installing packages missingno
(or any arbitrary package) on Win10:
Anaconda Prompt
conda install -c conda-forge missingno
Or, in the notebook type: !pip install missingno
Filtering series:
Your code:train_copy.isnull()
Could write like this to get column with null value only:
ncols = train_copy.isnull().sum
ncols[ncols!=0]
Check if whole data frame have any null value:
train.isnull().any().any()
For evaluation function, should print
instead of return
so when you loop through list of model and evaluate them, the result for each iteration printed to output.
The goal of this assignment was to introduce you to following concepts:
You learn how to use PCA for dimension reduction, KMeans, and Hierarchical Clustering. Also you learn to visualize the result of both tenichque.
Overall, excellent work! You are demonstrating that you are understanding the material and doing a great job of applying it. Keep it up!
For this question, there's a shorter version. (What if there's a hundred features?)
total_purchases = data.sum(axis=1) purchase_percent = data.div(total_purchases, axis=0) * 100
For this question. you're plotting 2 first features of original dataset, while the question requires to plot on 2 first component of PCA. Try this
plt.figure(figsize=(10, 7)) plt.scatter(pca_data['PC1'], pca_data['PC2'], c=cluster.labels_, cmap='rainbow');
Goal of this Assignment
The goal of this assignment was to introduce you to 2 main concepts in Machine Learning:
You learn how to query and clean data using pandas library in Python, make some plots which help to understand more about data with Seaborn library.
Things you did well:
To sum up: