Open coderschoolreview opened 5 years ago
could be written in shorter form : sum(ecom['Job']=='Lawyer')
overall, all answer are correct. great work :D
Should've plot in different plot. Sample code:
# Distribution of length between 3 species:
## list of species
ls = list(data.Species.unique())
plt.rcParams['figure.figsize'] = (20,5)
f,axes = plt.subplots(1,4)
i=0;
for v in ('SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm'):
plt.sca(axes[i])
axes[i].set_title(v)
for s in ls:
sns.distplot(data[data.Species==s][v],
hist=False,
label=s
)
i+=1
plt.legend()
plt.show()
Linear Regression only work for binary classification (you only have 2 classes) In this case you have 3.
Ref: https://www.quora.com/Can-you-do-multiclass-classification-with-logistic-regression
The goal of this assignment was to introduce you following concepts in Machine Learning:
!pip install missingno
, more on magic command heremissingno
to visualizepandas.get_dummies
To know "which is the best combination of parameter":
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
# Using gridsearchcv, random forest model and this param grid to find the best combination of parameters
# Hint: example
# https://stackoverflow.com/questions/30102973/how-to-get-best-estimator-on-gridsearchcv-random-forest-classifier-scikit
param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}
from sklearn.model_selection import GridSearchCV
gcv = GridSearchCV(RandomForestClassifier(),param_grid=param_grid)
gcv.fit(X,y)
gcv.best_params_
Installing packages missingno
(or any arbitrary package) on Win10:
Anaconda Prompt
conda install -c conda-forge missingno
Filtering series:
Your code:train_copy.isnull()
Could write like this to get column with null value only:
ncols = train_copy.isnull().sum
ncols[ncols!=0]
Check if whole data frame have any null value:
train.isnull().any().any()
For evaluation function, should print
instead of return
so when you loop through list of model and evaluate them, the result for each iteration printed to output.
The goal of this assignment was to introduce you to following concepts:
You learn how to use PCA for dimension reduction, KMeans, and Hierarchical Clustering. Also you learn to visualize the result of both tenichque.
To create a range of integer, use range(low,high,step)
k_values = range(1,20)
To access first columns of
pca_data
, usepca_data.iloc[:,0]
orpca_data['PC1']
.pca_data[:,0]
is wrong way of slicing a dataframe, hence the error.
Awesome good work :D