Closed ioanna-ki closed 6 years ago
Hi @ioanna-ki I believe it is easier for others to test it using the FileReader option, something like:
./spark.sh "ClusteringTrainEvaluate -c (StreamKM) -s (FileReader -f ../data/iris.arff -k 10 -d 10 -i 150)" 1> result_iris_streamKM.txt 2> log_iris_streamKM.log
Best Regards, Heitor
Issue addressed by #99
Thanks Ioanna 👍
Bug Report StreamKM Clusterer
Expected behavior
StreamKM should be keeping an up to date coreset tree, while doing kmeans clustering and assigning each input element to its nearest center.
Observed behavior
All the input assigned in one cluster. The counter of instances, the updated bucketmanager and the variable clusters are keeping their values only inside the foreachRDD action. So, when we are calling the assign function, there aren't any data to proceed.
Steps to reproduce the issue
used the iris.arff Command line _./spark.sh "ClusteringTrainEvaluate -c (StreamKM) -s (SocketTextStreamReader)" 1> result_iris_streamKM.txt 2> log_irisstreamKM.log
There isn't an error message but if you print the output of the assign function (clpairs) each element is assigned to cluster's index 0
Infrastructure details