add k-means clustering - Githubissues

fromudacity -> Data Scientist NanoDegree -> Part 3 - Unupervised Learning -> Clustering

[x] add scree-plot below
[x] add heat map similar to https://github.com/shane-kercheval/r-tools/blob/master/readme/kmeans_5_clusters_means.png
[x] validate NO NAs (i doubt sklearn allows NAs, R package does not)
[x] allow for dynamic scaling (e.g. center-scale, or normalization), but it should actually be a parameter of the class
[x] for heatmap, allow colors to be based off of either median or mean (in r-tools, red means it is above mean, and blue means it is below)

# A place for your work - create a scree plot - you will need to
# Fit a kmeans model with changing k from 1-10
# Obtain the score for each model (take the absolute value)
# Plot the score against k

def get_kmeans_score(data, center):
    '''
    returns the kmeans score regarding SSE for points to centers
    INPUT:
        data - the dataset you want to fit kmeans to
        center - the number of centers you want (the k value)
    OUTPUT:
        score - the SSE score for the kmeans model fit to the data
    '''
    #instantiate kmeans
    kmeans = KMeans(n_clusters=center)

    # Then fit the model to your data using the fit method
    model = kmeans.fit(data)

    # Obtain a score related to the model fit
    score = np.abs(model.score(data))

    return score

scores = []
centers = list(range(1,11))

for center in centers:
    scores.append(get_kmeans_score(data, center))

plt.plot(centers, scores, linestyle='--', marker='o', color='b');
plt.xlabel('K');
plt.ylabel('SSE');
plt.title('SSE vs. K');

shane-kercheval / oo-learning

add k-means clustering #43