Fix clust_boxplot handling of boolean data

nsh87 / receptormarker

Source for 'receptormarker' package for R: antibody receptor and phenotypic marker analysis

http://receptormarker.com

BSD 2-Clause "Simplified" License

4 stars 9 forks source link

Fix clust_boxplot handling of boolean data #69

Closed catterbu closed 8 years ago

catterbu commented 8 years ago

This pull request reverts multi_clust to use average silhouette score instead of the NbClust function to determine the best number of clusters and modifies the clust_boxplot function to plot boolean data properly.

catterbu commented 8 years ago

@nsh87 I think that this takes care of everything. I will check my email tomorrow morning if there is an error and try to address it tomorrow night or Sunday.

nsh87 commented 8 years ago

@catterbu: ah, dude, did you run the unit tests on your computer? this doesn't pass.

catterbu commented 8 years ago

I just pulled off of dev and worked. I wanted to get a solution last night since I am away from my computer all day today. Guess I will have to look at it tonight.

On Jul 9, 2016, at 11:51 AM, Nikhil Haas notifications@github.com wrote:

@catterbu: ah, dude, did you run the unit tests on your computer? this doesn't pass.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nsh87 commented 8 years ago

@catterbu: if you have time, i think you should add an option when creating the multi_clust object method='fast' or method='exhaustive', and make the default 'fast'. looking at PR #66, i think it makes sense to incorporate the current NbClust stuff somehow. with the method parameter available, i can choose to use 'fast' on the site for now (which would use silhouette score). i'm racking my brain trying to keep track of all these issues and PRs related simultaneously to NbClust and the multiClust S4 class. i want to start using the class, so I want to merge in PR #66, and then post merging that add the option for fast or exhaustive.

nsh87 commented 8 years ago

i can't merge PR #66 to use the S4 class b/c that PR also incorporates the buggy NbClust stuff.

nsh87 commented 8 years ago

@catterbu: the unit tests were failing because switching from NbClust to KMeans for estimating optimal k in multi_clust() changes the optimal k. I've updated the relevant unit tests (which were specifically checking NbClust's returned # of clusters) to use the new method='exhaustive' option, which will use NbClust instead of KMeans. The default is method='kmeans', which obviously uses KMeans' average silhouette score for estimating k. You'll see this in dev in a bit. I think this and your fix of the box plot will allow us to merge into master and push to production.