zhanghao-njmu / SCP

An end-to-end Single-Cell Pipeline designed to facilitate comprehensive analysis and exploration of single-cell data.
https://zhanghao-njmu.github.io/SCP/
GNU General Public License v3.0
351 stars 79 forks source link

QC #167

Open Andymare1994 opened 11 months ago

Andymare1994 commented 11 months ago

The question of whether a default UMI of 3000 is too high for QC?

zhanghao-njmu commented 11 months ago

The answer to whether a default UMI of 3000 is too high for quality control (QC) depends on two aspects.

First, it depends on the type of data you are working with. The UMI level should align with the expectations of the specific technology used. For single-cell data generated using technologies such as 10x Genomics or Smart-seq, I would say that this threshold is not considered high. Data produced under these technologies typically exhibit a higher UMI level.

Second, it depends on the data itself. The UMI threshold should be seen as a filter for removing low-quality outliers. If 3000 UMIs already represent cells of higher quality within the data, then this threshold might be too high and could be lowered. On the other hand, if filtering out cells with 3000 UMIs only removes cells of little value, then the threshold is appropriate.

It's important to note that cells with high UMI levels cannot be directly compared to cells with low UMI levels:

  1. We use Seurat::NormalizeData for normalization, and the resulting values can be interpreted as percentages.
  2. Most genes in cells have a UMI count of 1.
  3. For a given gene, a cell with 1000 UMIs may have a count of 1, and a cell with 10000 UMIs may also have a count of 1 because the additional UMIs fill in more genes, resulting in counts shifting from 0 to 1.
  4. At this point, you would notice that the expression value for gene A is higher at the 1000 UMI level (1/1000 > 1/10000).