coderschoolreview commented 5 years ago

Awesome good work :D

coderschoolreview commented 5 years ago

could be written in shorter form : sum(ecom['Job']=='Lawyer')

coderschoolreview commented 5 years ago

overall, all answer are correct. great work :D

coderschoolreview commented 5 years ago

Assignment 4:

Visualization:

Should've plot in different plot. Sample code:

# Distribution of length between 3 species: 
## list of species
ls = list(data.Species.unique())

plt.rcParams['figure.figsize'] = (20,5)
f,axes = plt.subplots(1,4)

i=0;
for v in ('SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm'):
    plt.sca(axes[i])
    axes[i].set_title(v)

    for s in ls:
        sns.distplot(data[data.Species==s][v],
                     hist=False,
                     label=s
                    )
    i+=1

plt.legend()
plt.show()

coderschoolreview commented 5 years ago

Model:

Linear Regression only work for binary classification (you only have 2 classes) In this case you have 3.

Ref: https://www.quora.com/Can-you-do-multiclass-classification-with-logistic-regression

coderschoolreview commented 5 years ago

hyl0428: Assignment 5

The goal of this assignment was to introduce you following concepts in Machine Learning:

Data quality check: missing value, anomaly, ...
Exploratory Data Analysis: ratio of class, relationship of feature with another and relationship of feature with target, and data visualization.
Preprocessing data: handling imbalance dataset, dealing with categorical variable.
Modeling & evaluation: building classifier and evaluate, Hyper Parameter Tuning.

Things you did well

Aweomse, you knew magic function !pip install missingno, more on magic command here
Checking data quality: check for missing value and using missingno to visualize
EDA: Ratio of classes, finding out that data is imbalance, visualizing relationship between features
Preprocessing: under sampling / over sampling, train test split
Modeling & evaluate: train and evaluate multiple model by comparing classification report of models against each other

Things to work on

Preprocessing categorical features. Try pandas.get_dummies
GridSearchCV and RandomSearchCV: common practice is using RandomSearchCV to narrow down search range then fine tuning with GridSearchCV, can read this article

Minor tips

To know "which is the best combination of parameter":

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

# Using gridsearchcv, random forest model and this param grid to find the best combination of parameters 
# Hint: example
# https://stackoverflow.com/questions/30102973/how-to-get-best-estimator-on-gridsearchcv-random-forest-classifier-scikit

param_grid = {
  'bootstrap': [True],
  'max_depth': [80, 90, 100, 110],
  'max_features': [2, 3],
  'min_samples_leaf': [3, 4, 5],
  'min_samples_split': [8, 10, 12],
  'n_estimators': [100, 200, 300, 1000]
}

from sklearn.model_selection import GridSearchCV
gcv = GridSearchCV(RandomForestClassifier(),param_grid=param_grid)
gcv.fit(X,y)
gcv.best_params_

Installing packages missingno (or any arbitrary package) on Win10:
- Go to Anaconda Prompt
- Type conda install -c conda-forge missingno
Filtering series:
- Your code:train_copy.isnull()
- Could write like this to get column with null value only:
```
ncols = train_copy.isnull().sum
ncols[ncols!=0]
```
Check if whole data frame have any null value:

train.isnull().any().any()
For evaluation function, should print instead of return so when you loop through list of model and evaluate them, the result for each iteration printed to output.

coderschoolreview commented 5 years ago

Assignment 6

Goal of this Assignment

The goal of this assignment was to introduce you to following concepts:

Unsupervised Learning (KMeans, Hierachical Clustering)
PCA

You learn how to use PCA for dimension reduction, KMeans, and Hierarchical Clustering. Also you learn to visualize the result of both tenichque.

Things you did well:

Ultilizing Pandas to read, summary and visualize data.
Almost got everything right

To sum up:

Great work! This showing you understood the concept and able to apply it. Keep it up!

Few minor tips

To create a range of integer, use range(low,high,step)
k_values = range(1,20)
To access first columns of pca_data, use pca_data.iloc[:,0] or pca_data['PC1']. pca_data[:,0] is wrong way of slicing a dataframe, hence the error.

hyl0428 / ML-Assignments

Thanks! #1

Assignment 4:

Visualization:

Model:

hyl0428: Assignment 5

Things you did well

Things to work on

Minor tips

Assignment 6

Goal of this Assignment

Things you did well:

To sum up:

Few minor tips