rileypsmith / sklearn-som

A simple, rectangular self-organizing map with methods similar to clustering methods in Scikit Learn.
MIT License
75 stars 23 forks source link

ValueError: operands could not be broadcast together with shapes (3,7) (3,3) #3

Closed hendrywijaya98 closed 3 years ago

hendrywijaya98 commented 3 years ago

Excuse me rileypsmith, i want to ask for your help about error issue when im doing my project for my final thesis purpose

basically this function for find optimal k came from kmeans based on optimalK function from this project

and then i replace kmeans with som untill the code like on the below

`def optimalK(data, nrefs=3, maxClusters=15): """ Calculates KMeans optimal K using Gap Statistic from Tibshirani, Walther, Hastie Params: data: ndarry of shape (n_samples, n_features) nrefs: number of sample reference datasets to create maxClusters: Maximum number of clusters to test for Returns: (gaps, optimalK) """ gaps = np.zeros((len(range(1, maxClusters)),)) resultsdf = pd.DataFrame({'clusterCount':[], 'gap':[]}) for gap_index, k in enumerate(range(1, maxClusters)):

    # Holder for reference dispersion results
    refDisps = np.zeros(nrefs)

    # For n references, generate random sample and perform kmeans getting resulting dispersion of each loop
    for i in range(nrefs):

        # Create new random reference set
        randomReference = np.random.random_sample(size=data.shape)

        # Fit to it
        som = SOM(k)
        som.fit(randomReference)

        refDisp = som.inertia_
        refDisps[i] = refDisp

    # Fit cluster to original data and create dispersion
    som = SOM(k)
    som.fit(data)

    # elbow method
    origDisp = som.inertia_

    # Calculate gap statistic
    gap = np.log(np.mean(refDisps)) - np.log(origDisp)

    # Assign this loop's gap statistic to gaps
    gaps[gap_index] = gap

    resultsdf = resultsdf.append({'clusterCount':k, 'gap':gap}, ignore_index=True)

return (gaps.argmax() + 1, resultsdf)  # Plus 1 because index of 0 means 1 cluster is optimal, index 2 = 3 clusters are optimal`

and then i'm applying the function same like what he is doing on that project do, by calling Optimal K

k, gapdf = optimalK(X, nrefs=5, maxClusters=10) print(f'Optimal k from X is {k}')

image

unfortunatelly, im facing the error like that

well, this is my first error that will be the first issue that i want to discuss and i still have one issue again about som data input, but if you please, may i talk on next issue

best regards

thank you

hendrywijaya98 commented 3 years ago

well sorry if my grammar looks bad, because my mind is still messy

rileypsmith commented 3 years ago

Hi Henry, what is the shape of your dataset? Does your dataset have 7 features per observation? This seems like an issue of how you are building the self-organizing map. Notice that unlike a KMeans clustering algorithm (where you only need to specify the number of clusters), sklearn-som has 3 things to keep track of: the desired shape of the map in the x and y direction and the dimensionality of the data space. You are constructing it with only one positional argument, which will set m (the vertical dimension) of the SOM. But n (the horizontal dimension) and dim (dimensionality of your data) will both still be 3. So if your data has more than 3 features, you will get this error.

For more information, please check out the docs at ReadTheDocs. Marking this as closed for now.