Closed barrybecker4 closed 8 years ago
I've just reviewed all the commits and I have to say it's a really nice work. Thanks for your support.
About this topic:
"There was no performance (or correctness) difference between using if (k1 > 0) leftFreqs.sum else 0 and just leftFreqs.sum."
I just remember that it was motivated by a performance matter. But it seems useless, and actually I didn't even check the impact of this line.
Thanks. I have another question. There were two places that used ".clone". I removed the ".clone" because I don't think it was needed. The tests passed without it. Can you recall why you added it? I don't want to introduce a bug. Used to be
entropyFreqs = (cand, freq, leftAccum.clone(), rightTotal) +: entropyFreqs
now is
entropyFreqs = (cand, freq, leftAccum, rightTotal) +: entropyFreqs
This PR adds the stoppingCriteria param. Other changes: Added ThresholdFinder classes to make it clear that there are two alternative strategies for finding thresholds. Since one used RDDs and the other Arrays, I could not get the api to be the same or make as much code as I wanted in common, but hopefully its an improvement. I set the version to 0.2-SNAPSHOT in build.scala to prepare for the next release (which I assume is 0.2). Package is now org.apaphe.spark instead of just org.apache.