wguo-research / scCancer

A package for automated processing of single cell RNA-seq data in cancer
92 stars 39 forks source link

Cell QC suggestion #12

Closed Puriney closed 4 years ago

Puriney commented 4 years ago

Hi, For cell quality selection, I noticed the scCancer uses a 'both' strategy. For the upper boundary, the cells with higher nUMIs and nGenes are removed. The thresholds were determined by the boxplot.stat function.

However, in the cancer context, cancer cells tend to have higher nUMIs. The cell QC strategy might be a problem for a scRNA data containing both cancer and non-malignant cells.

How about following the "A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor" paper? It filters cells having poorly low nUMIs/nGenes or high %mito based on MAD.

wguo-research commented 4 years ago

Yes, tumor cells have higher nUMIs. But according to the distribution of nUMI or nGene, we can found some outliers existed generally, which have exceeded the common size of tumor cells and are more likely to be doublets/multiplets. So we set this QC metric.

If you don't want to perform this QC, you can modify the threshold in the file cell.QC.thres.txt generated by scStatistics as a larger value, so that scAnnotation will filter no or less outliers.