Open joseph-g25 opened 2 years ago
Key objectives:
Experiment with KMeans clustering, and dropping various features.
Create a final pipeline Include data preparation steps Includes a clustering algorithm (use ColumnTransformer to replace selected features) Fits transformer to training data Tests model on test data
*Compare to a model without clustering algorithm
From Dr. Yarnall Canvas assignment:
Do the following:
In an initial script, you should pick your features to replace and then run the code needed to figure out how many clusters to use.
Your final script should
Load the data, split off test and train sets. Create a pipeline that does any preliminary data preparation (don't forget to scale). clusters (using the number of clusters you determined during your investigation stage). Note: you'll want to use a ColumnTransformer so that the output of kmeans (the distances) replace only the selected features does your classification. fit your transformer to the training data. Test your model on the test data. Remember to compare to a classifier built without the clustering. If you want, you can try another experiment -- first, apply PCA and select a handful of principal components. Then, do clustering in THAT space. This is a common technique when the number if dimensions is quite high.