Variable Selection for model based clustering

stephenslab / count-clustering

Code and data reproducing results from our PLOS Genetics paper "Visualizing the structure of RNA-seq expression data using grade of membership models"

http://stephenslab.github.io/count-clustering

5 stars 1 forks source link

Variable Selection for model based clustering #2

Open kkdey opened 9 years ago

kkdey commented 9 years ago

It seemed from the Yoav single cell data analysis that the cell cycle genes were more informative about the clustering of single cells as per the cell cycle phases, compared to using Structure model on all the genes, which did not yield any meaningful patterns after adjusting for the batch effects. Some work on suitable variable selection for clustering has been considered by Raftery and Dean, but it is not practically useful for cases with large number of features, as in our case. So, we are in lookout for a simple but novel preprocessing step for identifying variables to be used to drive biologically meaningful clusters.

stephens999 commented 9 years ago

i think this is Raftery and Dean. (also not the only reference... must be many others)

On Tue, Sep 15, 2015 at 2:59 AM, Kushal K Dey notifications@github.com wrote:

It seemed from the Yoav single cell data analysis that the cell cycle genes were more informative about the clustering of single cells as per the cell cycle phases, compared to using Structure model on all the genes, which did not yield any meaningful patterns after adjusting for the batch effects. Some work on suitable variable selection for clustering has been considered by Irizzary and Dean, but it is not practically useful for cases with large number of features, as in our case. So, we are in lookout for a simple but novel preprocessing step for identifying variables to be used to drive biologically meaningful clusters.

— Reply to this email directly or view it on GitHub https://github.com/stephenslab/count-clustering/issues/2.

kkdey commented 9 years ago

yepp..sorry for that..corrected now...trying to figure out other references

kkdey commented 9 years ago

http://arxiv.org/pdf/1205.1053v1.pdf ..looks interesting

stephens999 commented 9 years ago

wonder if they have software

On Mon, Sep 21, 2015 at 7:17 PM, Kushal K Dey notifications@github.com wrote:

http://arxiv.org/pdf/1205.1053v1.pdf...looks interesting

— Reply to this email directly or view it on GitHub https://github.com/stephenslab/count-clustering/issues/2#issuecomment-142144270 .

jhsiao999 commented 9 years ago

Just to note on Kushal's original comment. We fitted the topic model on the single cell data using the entire set of genes and also just the cell cycle genes. We also computed a cell cycle indicator using a algorithm established for bulk RNA data in previous research. When using cell cycle genes, we were able to cluster the samples and find meaningful patterns corresponding to the cell cycle indicator. But we did not find the same results using the entire set of genes.