Why is greater than 100? #8786

Closed zhang-HZAU closed 5 months ago

zhang-HZAU commented 5 months ago

I use three seurat objects to merge and then execute SCTransform, and use "SCT" assay to execute PercentageFeatureSet(s_merged, pattern = "^mt-").Then it was discovered that was greater than 100.


The code that generates the exception is as follows:

merge_list <- list(con, case2wk, case4wk)
s_merged <- merge(x = merge_list[[1]], y = merge_list[2:3]) %>%
  SCTransform(assay = "RNA", variable.features.n = 3000)
s_merged[[""]] <- PercentageFeatureSet(s_merged, pattern = "^mt-")

merge_list as follows:



s_merged <- readRDS(glue("{output_dir}/s_merged.rds"))

s_merged[[""]] <- PercentageFeatureSet(s_merged, pattern = "^mt-")
debug_meta <- s_merged[[]][s_merged[[]][, ""] > 100,]

test_sct_count <- LayerData(object = s_merged, assay = "SCT", 
                            layer = "counts")

test_count_colsum <- colSums(test_sct_count)

features.layer <- grep(pattern = "^mt-", 
                       x = rownames(x = s_merged[["SCT"]]["counts"]), value = TRUE)

layer.sums <- colSums(x = s_merged[features.layer, 
                                   , drop = FALSE])

layer.perc <- layer.sums/s_merged[[]][colnames(test_sct_count), 
                                    paste0("nCount_", "SCT")] * 100

feature_count <- layer.sums[c("GGTGTCGAGCTCTATG-5_1", "GTCTACCAGTGCTCAT-5_1", "TTCAATCAGTCACGAG-5_1")]
all_count <- test_count_colsum[c("GGTGTCGAGCTCTATG-5_1", "GTCTACCAGTGCTCAT-5_1", "TTCAATCAGTCACGAG-5_1")]
feature_count/all_count * 100

During the debugging process, it was found that nCount_SCT obtained through colSums was inconsistent with the in Fig 1.Through the debug code I got the same exception as in in fig1.My understanding of the calculation process of should be feature_count/all_count * 100, and the result is also in line with expectations. The problem lies in the inconsistency of nCount_SCT.




My question is:

mhkowalski commented 5 months ago


You should likely use the RNA assay rather than the SCT assay to calculate

Regarding the inconsistency, could you please check if you get the same problem (>100) if you run the same code on one of your objects separately? My hunch is that this has something due to running SCT on multi-layered objects.

mhkowalski commented 5 months ago

Also, it would be great if you could provide a reproducible example so we could look into this further. I wasn't able to reproduce this using SeuratData objects.

pbmcsca <- UpdateSeuratObject(pbmcsca)
pbmc3k <- UpdateSeuratObject(pbmc3k)
pbmcsca[["RNA"]] <- as(object = pbmcsca[["RNA"]], Class = "Assay5")
pbmc3k[["RNA"]] <- as(object = pbmc3k[["RNA"]], Class = "Assay5")
obj <- merge(pbmc3k, pbmcsca)
obj <- SCTransform(obj, variable.features.n = 3000)
sum(obj$nCount_SCT != colSums(obj[['SCT']]$counts))
[1] 0
obj[[""]] <- PercentageFeatureSet(obj, pattern = "^MT-")
sum(obj[[""]]>100 )
> sum(obj[[""]]>100 )
[1] 0
zhang-HZAU commented 5 months ago


You should likely use the RNA assay rather than the SCT assay to calculate

Regarding the inconsistency, could you please check if you get the same problem (>100) if you run the same code on one of your objects separately? My hunch is that this has something due to running SCT on multi-layered objects.

Thank you very much for your reply.

I'm so confused now, feel like I've encountered the proton in the Three-Body novel, hhh.

Because I'm re-executing with the same input data, code and environment. It is found that the calculated using the "sct" assay this time is normal and is less than 100.

The data used this time are: GSE140812, GSE193265. At present, it is normal for to use "RNA" assay calculation.

zhang-HZAU commented 5 months ago

I reviewed the entire operation and now I know what caused to be greater than 100.

After executing SCTransform, in order to successfully execute FindAllMarkers, there will be an intermediate step s_merged <- PrepSCTFindMarkers(object = s_merged). PrepSCTFindMarkers will update the count value in the expression, but will not update nCount_SCT in, which leads to the inconsistency in nCount_SCT and is greater than 100.

My wrong execution order:

s_merged <- merge(x = merge_list[[1]], y = merge_list[2:3]) %>%
  SCTransform(assay = "RNA", variable.features.n = 3000)

s_merged <- PrepSCTFindMarkers(object = s_merged)

s_merged[[""]] <- PercentageFeatureSet(s_merged, pattern = "^mt-", assay = "SCT")

correct order:

s_merged <- merge(x = merge_list[[1]], y = merge_list[2:3]) %>%
  SCTransform(assay = "RNA", variable.features.n = 3000)

s_merged[[""]] <- PercentageFeatureSet(s_merged, pattern = "^mt-", assay = "SCT")

s_merged <- PrepSCTFindMarkers(object = s_merged)

I'm very sorry, personal reasons led to this error. thank you for your reply.