Hi
Nice piece of code, thanks:)
However, I hit a speed limit which was really order of magnitude slower than R.
I'm not catching up with R but improved by a factor 100 in time for 500 sized clusters.
Mainly by making a map to kill the loop in findByCluster.
I've added this test ClusterPerfTest, which is not a test, but print timings
If ever I cannot stand the time the perf for larger clusters, I guess the solution will be to move all that to scala and parallelize it.
Hi Nice piece of code, thanks:) However, I hit a speed limit which was really order of magnitude slower than R. I'm not catching up with R but improved by a factor 100 in time for 500 sized clusters. Mainly by making a map to kill the loop in findByCluster. I've added this test ClusterPerfTest, which is not a test, but print timings
If ever I cannot stand the time the perf for larger clusters, I guess the solution will be to move all that to scala and parallelize it.
thanks again for the contrib Alex
ORIG
Running com.apporiented.algorithm.clustering.ClusterPerfTest
cluster.size time.ms 2 3 4 0 8 5 16 25 32 56 64 240 128 3253 256 51018 512 969193
Heap size 760M
NEW
cluster.size time.ms 2 3 4 1 8 2 16 11 32 36 64 56 128 131 256 726 512 10167
Heap size 270M