Unsupervised classifications

renozao / NMF

NMF: A Flexible R package for Nonnegative Matrix Factorization

137 stars 41 forks source link

Unsupervised classifications #11

Open sarahmanderni opened 10 years ago

sarahmanderni commented 10 years ago

Hi,

I have a matrix(mat) of gene expression data with patients(417 patients) as columns and genes (180 genes) as rows. I want to classify the patients(not the gene expression pattern) based on their gene expressions into four classes. Using following command: res <- nmf(mat, 4, nrun = 200, seed = 123456)

Do you think it is a correct way of classifying the patients? Using aheatmap command I can see that there exists four separate basises. I do not know how to get the barcodes of patients for each basis? I used the "basisnames" command: basisnames(res) but I got NULL.

How can I know which patients are grouped together? Thanks for the help.

renozao commented 10 years ago

Yes, NMF actually gives you a biclustering model, grouping patients (columns) and genes (rows), which have expression pattern that are characteristic to each patient group. The cluster memberships are returned by

predict(res)

, based on the most contributing basis component in each patient. The contributions of each basis component is given the matrix H ( X = W * H) by

coef(res)

You can see the contribution patterns with

# default is to scale contributions so sum up to one
coefmap(res)
# consensus matrix
consensusmap(res)

sarahmanderni commented 10 years ago

Thanks for the response. I tried to estimate the rank to get the best possibility. you can see the results for ranks 3:6 in the figure. Clustering data into 3 clusters has the highest cophenetic but rank 3 also has the highest dispersion. Do you think I can support my idea of clustering the samples into 3 in this situation? outcome

renozao commented 10 years ago

Higher the dispersion the better, as it measures how much distinct the consensus clusters are. So these two measures are actually consistent.