greenelab / hgsc_subtypes

Two or three subtypes of high grade serous ovarian cancer subtypes fit data from different populations better than four
BSD 3-Clause "New" or "Revised" License
11 stars 9 forks source link

k=2 clustering results #30

Open changt34x opened 7 years ago

changt34x commented 7 years ago

Clustering results using k=2 clusters, for both kmeans and nmf, do not follow into clustering results for k=3 or k=4 clustering. When looking at clustering results and comparing k=2 to k=3, very few samples in k=3 remain in their k=2 cluster.

Additionally, when compared to Way et. al. 2016, clustering for k=2 appears reversed in Supplementary Table S3. Samples placed in cluster 1 in this analysis appear in cluster 2 in the published paper.

gwaybio commented 7 years ago

Is k = 2 the only difference @changt34x ? Are they like this for all datasets? Its possible that the different subsetting steps (new common_genes) performed by @amyecampbell altered the cluster assignment numbers

changt34x commented 7 years ago

I haven't compared all of the samples (due to the change in ordering or addition/removal of samples between the current version and the paper) but from a random set of comparisons it only affects k=2 for kmeans and NMF, but not any other.