Open kkdey opened 9 years ago
i think this is Raftery and Dean. (also not the only reference... must be many others)
On Tue, Sep 15, 2015 at 2:59 AM, Kushal K Dey notifications@github.com wrote:
It seemed from the Yoav single cell data analysis that the cell cycle genes were more informative about the clustering of single cells as per the cell cycle phases, compared to using Structure model on all the genes, which did not yield any meaningful patterns after adjusting for the batch effects. Some work on suitable variable selection for clustering has been considered by Irizzary and Dean, but it is not practically useful for cases with large number of features, as in our case. So, we are in lookout for a simple but novel preprocessing step for identifying variables to be used to drive biologically meaningful clusters.
— Reply to this email directly or view it on GitHub https://github.com/stephenslab/count-clustering/issues/2.
yepp..sorry for that..corrected now...trying to figure out other references
http://arxiv.org/pdf/1205.1053v1.pdf ..looks interesting
wonder if they have software
On Mon, Sep 21, 2015 at 7:17 PM, Kushal K Dey notifications@github.com wrote:
http://arxiv.org/pdf/1205.1053v1.pdf...looks interesting
— Reply to this email directly or view it on GitHub https://github.com/stephenslab/count-clustering/issues/2#issuecomment-142144270 .
Just to note on Kushal's original comment. We fitted the topic model on the single cell data using the entire set of genes and also just the cell cycle genes. We also computed a cell cycle indicator using a algorithm established for bulk RNA data in previous research. When using cell cycle genes, we were able to cluster the samples and find meaningful patterns corresponding to the cell cycle indicator. But we did not find the same results using the entire set of genes.
It seemed from the Yoav single cell data analysis that the cell cycle genes were more informative about the clustering of single cells as per the cell cycle phases, compared to using Structure model on all the genes, which did not yield any meaningful patterns after adjusting for the batch effects. Some work on suitable variable selection for clustering has been considered by Raftery and Dean, but it is not practically useful for cases with large number of features, as in our case. So, we are in lookout for a simple but novel preprocessing step for identifying variables to be used to drive biologically meaningful clusters.