jeffheaton / encog-dotnet-core

http://www.heatonresearch.com/encog
Other
430 stars 150 forks source link

K-Means Clustering does not cluster #100

Open ctasoluk opened 8 years ago

ctasoluk commented 8 years ago

I have been trying simple K-Means clustering, and always clusters into 1-cluster. Here is the data set to be clustered.

/* * The data to be clustered. / public static final double[][] DATA = { {2617.83}, {5885.6}, {1690.71}, {3162.3}, {2180.97}, {1913.49},{2493.73},{1341.28},{4972.91},{2098.54},{3645.07},{1554.69},{1483.03},{339.25}, {12153.81},{1082.09},{1266.5} };

Note that, when you remove the last element {1266.5} or placed it in different position in the Data array, you get 2-clusters:

* Cluster 1 * [2617.83] [1690.71] [3162.3] [2180.97] [1913.49] [2493.73] [1341.28] [2098.54] [3645.07] [1554.69] [1483.03] [339.25] [1266.5] [1082.09] * Cluster 2 * [5885.6] [4972.91] [12153.81]

Here is the SimpleKMeans example with a problematic data set, so that you can easily replicated the problem.

import java.util.Arrays;

import org.encog.ml.MLCluster; import org.encog.ml.data.MLDataPair; import org.encog.ml.data.MLDataSet; import org.encog.ml.data.basic.BasicMLData; import org.encog.ml.data.basic.BasicMLDataPair; import org.encog.ml.data.basic.BasicMLDataSet; import org.encog.ml.kmeans.KMeansClustering;

public class SimpleKMeans {

/**
 * The data to be clustered.
 */
public static final double[][] DATA = { {2617.83}, {5885.6}, {1690.71}, {3162.3}, {2180.97},
        {1913.49},{2493.73},{1341.28},{4972.91},{2098.54},{3645.07},{1554.69},{1483.03},{339.25},
        {12153.81},{1082.09},{1266.5}
};

/**
 * The main method.
 * @param args Arguments are not used.
 */
public static void main(final String args[]) {

    final BasicMLDataSet set = new BasicMLDataSet();

    for (final double[] element : SimpleKMeans.DATA) {
        set.add(new BasicMLData(element));
    }

    final KMeansClustering kmeans = new KMeansClustering(2, set);

    kmeans.iteration(100);
    //System.out.println("Final WCSS: " + kmeans.getWCSS());

    // Display the cluster
    int i = 1;
    for (final MLCluster cluster : kmeans.getClusters()) {
        System.out.println("*** Cluster " + (i++) + " ***");
        final MLDataSet ds = cluster.createDataSet();
        final MLDataPair pair = BasicMLDataPair.createPair(
                ds.getInputSize(), ds.getIdealSize());
        for (int j = 0; j < ds.getRecordCount(); j++) {
            ds.getRecord(j, pair);
            System.out.println(Arrays.toString(pair.getInputArray()));

        }
    }
}

}

jentfoo commented 7 years ago

I had this issue in 3.3.0 as well. But seems to be fixed in 3.4. Not sure what specifically changed to fix it.