sandhya212 / BISCUIT_SingleCell_IMM_ICML_2016

R Codebase for BISCUIT: Infinite Mixture Model to cluster and impute single cells.
65 stars 33 forks source link

Batch effect remains. #22

Closed nakabayashihub closed 5 years ago

nakabayashihub commented 5 years ago

Dear Developer,

I attempt to run BISCUIT on the scRNA-seq data of infiltrating immune cells to solid tumor obtained from two patients. 1303 and 637 cells are obtained from patient #1 and #2, respectively.

Cells are divided into 8 clusters by BISCUIT. One cluster is composed of cells from only patient #2. I think that batch effect remains in spite of normalization by BISCUIT.

I set parameter alpha = 1.0 and select variable 600 genes for analysis. Could you teach me how the batch effect is canceled if there is any information?

nakabayashihub commented 5 years ago

Dear Developer,

I increased the number of variable genes from 600 to 1500 and decrease parameter alpha from 1.0 to 0.01. A cluster composed of samples obtained from one patient is no longer detected.

I set alpha by try and error. I would like to know how to decide an appropriate alpha.

Thanks.

sandhya212 commented 5 years ago

Hi, Inferring alpha is an open problem in Bayesian nonparametrics. Decreasing alpha would lead to lesser, more tight clusters. How many clusters do you get when setting alpha= 0.01? Does the downstream analysis make sense now?

nakabayashihub commented 5 years ago

Now I try to cluster 3633 cells from 3 patients. I get 22, 8 and 6 clusters when alpha = 1.0, 0.01 and 0.001, respectively. When alpha = 1.0, some clusters are composed of cells from one patient. When cluster is too small, difference between samples is detected. In this case, it is appropriate when alpha = 0.01. Number of cluster is sensitive to alpha more than I had imagine. Thanks

nakabayashihub commented 5 years ago

Hi, When alpha = 0.01, clusters composed of cells from three patients are obtained by BISCUIT analysis. I confirm whether these clusters match conventional hematopoietic cell types to investigate the expression of signature genes such LEF1 for T cell and CD1D for macrophage and so on. Many cells in a certain cluster specifically express such signature genes. BISCUIT appropriate clusters these cells in spite of difference between patients. Thanks.

nakabayashihub commented 5 years ago

Hi, In this analysis, 1757 variable genes are selected before BISCUIT analysis. 1757 genes are listed in Genes_selected.csv file in inferred mean folder. But 1750 genes are included in the matrix of Imputed_Y_logspace.txt. 7 genes are lost. I comment out choose_genes in start_file.R. Could you teach me how I know which genes are selected for BISCUIT analysis? Thanks

sandhya212 commented 5 years ago

Glad to hear that BISCUIT is giving you meaningful results. Regarding the 7 genes that were dropped out: The parallel code implementation, as it currently stands, requires the same number of genes per parallel block. Therefore the overall number of genes selected will be a multiple of the gene_batch variable.

nakabayashihub commented 5 years ago

A happy new year. Thank you for your kind reply. I understand how 7 genes were dropped out. I would like to analyze our data furthermore. Thanks again. Sincerely Yours.