shane-kercheval / oo-learning

Python machine learning library based on Object Oriented design principles; the goal is to allow users to quickly explore data and search for top machine learning algorithm candidates for a given dataset
MIT License
1 stars 0 forks source link

add k-means clustering #43

Closed shane-kercheval closed 6 years ago

shane-kercheval commented 6 years ago

fromudacity -> Data Scientist NanoDegree -> Part 3 - Unupervised Learning -> Clustering

# A place for your work - create a scree plot - you will need to
# Fit a kmeans model with changing k from 1-10
# Obtain the score for each model (take the absolute value)
# Plot the score against k

def get_kmeans_score(data, center):
    '''
    returns the kmeans score regarding SSE for points to centers
    INPUT:
        data - the dataset you want to fit kmeans to
        center - the number of centers you want (the k value)
    OUTPUT:
        score - the SSE score for the kmeans model fit to the data
    '''
    #instantiate kmeans
    kmeans = KMeans(n_clusters=center)

    # Then fit the model to your data using the fit method
    model = kmeans.fit(data)

    # Obtain a score related to the model fit
    score = np.abs(model.score(data))

    return score

scores = []
centers = list(range(1,11))

for center in centers:
    scores.append(get_kmeans_score(data, center))

plt.plot(centers, scores, linestyle='--', marker='o', color='b');
plt.xlabel('K');
plt.ylabel('SSE');
plt.title('SSE vs. K');