sunduanchen / Scissor

Scissor package
GNU General Public License v3.0
168 stars 29 forks source link

0 selected cells #33

Open gmcregis opened 2 years ago

gmcregis commented 2 years ago

Hello,

I've been trying to run scissor with a TCGA dataset and while the survival analyses work fine, when I try to use the binomial family to assess a phenotype it selects 0 cells. I tried it with a couple different phenotypes of interest and still kept getting 0%.

I'm wondering if there's another way to analyze the data (the tutorial didn't really go over using Gaussian analyses so I'm unsure how to proceed) or if the issue is the phenotype data itself that is just not statistically powerful enough (maybe I need more samples).

my phenotype of interest has 126 + and 27 - (153 TCGA samples total) for a sc data set of 7500 cells. scissor analysis with survival using family = cox gets me about 21% cells selected (alpha = 0.05). But when I input the phenotype data using family = binomial I always get 0% no matter the alpha.

Let me know if you need any more info/code from me (I'm fairly new to bioinformatics)

Thanks!

sunduanchen commented 2 years ago

Hi,

If possible, can you share your code/data with me?

Best, Duanchen

On Tue, 17 May 2022 at 04:29, gmcregis @.***> wrote:

Hello,

I've been trying to run scissor with a TCGA dataset and while the survival analyses work fine, when I try to use the binomial family to assess a phenotype it selects 0 cells. I tried it with a couple different phenotypes of interest and still kept getting 0%.

I'm wondering if there's another way to analyze the data (the tutorial didn't really go over using Gaussian analyses so I'm unsure how to proceed) or if the issue is the phenotype data itself that is just not statistically powerful enough (maybe I need more samples).

my phenotype of interest has 126 + and 27 - (153 TCGA samples total) for a sc data set of 7500 cells. scissor analysis with survival using family = cox gets me about 21% cells selected (alpha = 0.05). But when I input the phenotype data using family = binomial I always get 0% no matter the alpha.

Let me know if you need any more info/code from me (I'm fairly new to bioinformatics)

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/sunduanchen/Scissor/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYQWDCDIQ6BGVAPRZDD62TVKKV2RANCNFSM5WCVNNXA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gmcregis commented 2 years ago

Hi,

Thanks so much! I've attached all the info I could here - let me know if you need anything else!

phenotype_unprep_scissor.csv

I wasn't able to upload all the files but here's the code:

library(tidyverse) library(janitor) library(dplyr) library(knitr) library(Scissor) library(Seurat) library(Matrix) library(preprocessCore) wd <- "~/Library/CloudStorage/Box-Box/"

sc dataset loading

sc_dataset <- readRDS(paste0(wd,"Scissor/NatureCommsSeuratObjHarmCorrSubSample7500.rds")) DimPlot(sc_dataset, reduction = 'umap', label = T, label.size = 10) sc_dataset <- NormalizeData(object = sc_dataset, normalization.method ="LogNormalize" , scale.factor = 10000, verbose = TRUE) sc_dataset <- FindVariableFeatures(object = data, selection.method = "vst", verbose = TRUE) sc_dataset <- ScaleData(object = data, verbose = TRUE) sc_dataset <- RunPCA(object = data, features = VariableFeatures(data), verbose = TRUE) sc_dataset <- FindNeighbors(object = data, dims = 1:10, verbose = TRUE) sc_dataset <- FindClusters( object = data, resolution = 0.6, verbose = TRUE) sc_dataset <- RunTSNE(object = data, dims = 1:10) sc_dataset <- RunUMAP(object = data, dims = 1:10, verbose = TRUE)

bulk dataset loading

bulk_dataset <- readRDS(paste0(wd, "Scissor/my_bulk_dataset_clean.rds")) bulk_survival <- read.csv(paste0(wd, "Scissor/phenotype_unprep_scissor.csv"))

bulk_survival$vital_status[bulk_survival$vital_status == "Dead"] <- 1 bulk_survival$vital_status[bulk_survival$vital_status == "Alive"] <- 0 bulk_survival$days_to_death <- as.numeric(bulk_survival$days_to_death) bulk_survival$vital_status <- as.numeric(bulk_survival$vital_status) bulk_survival$age_at_diagnosis <- as.numeric(bulk_survival$age_at_diagnosis) rownames(bulk_survival) <- make.names(bulk_survival$case_submitter_id) bulk_survival <- bulk_survival[,-c(1,2)] all(colnames(bulk_dataset) %in% rownames(bulk_survival)) include <- rownames(bulk_survival[rownames(bulk_survival)%in%colnames(bulk_dataset),]) bulk_dataset <- bulk_dataset[,include] bulk_survival <- bulk_survival[include,] all(rownames(bulk_survival) == colnames(bulk_dataset)) all(colnames(bulk_dataset) == bulk_survival$case_submitter_id) bulk_dataset <- as.matrix(bulk_dataset) bulk_survival[2,1] <-460 bulk_survival[3,1] <-464 bulk_survival[36,1] <-215 bulk_survival[44,1] <-256 bulk_survival[45,1] <-179 bulk_survival[48,1] <-13 bulk_survival[49,1] <-281 bulk_survival[54,1] <-265 bulk_survival[55,1] <-270 bulk_survival[56,1] <-202 bulk_survival[57,1] <-153 bulk_survival[60,1] <-114 bulk_survival[61,1] <-138 bulk_survival[82,1] <-215 bulk_survival[91,1] <-460 bulk_survival[92,1] <-290 bulk_survival[97,1] <-211 bulk_survival[99,1] <-940 bulk_survival[100,1] <-282 bulk_survival[101,1] <-446 bulk_survival[102,1] <-165 bulk_survival[105,1] <-48 bulk_survival[119,1] <-37 bulk_survival[120,1] <-434 bulk_survival[121,1] <-143 bulk_survival[123,1] <-219 bulk_survival[124,1] <-158 bulk_survival[128,1] <-76 bulk_survival[129,1] <-294 bulk_survival[139,1] <-267 bulk_survival[141,1] <-352

bulk_survival[120,3] <- 0 bulk_survival[129,3] <- 1

exclude row/column 122 (tcga.28.2510 - no survival data)

bulk_survival <- bulk_survival[-122,] bulk_dataset <- bulk_dataset[,-122]

phenotype <- bulk_survival[, - c(2,4,5,6,7,8)]

colnames(phenotype) <- c("time", "status") rownames(phenotype) <- c(1:153)

infos1 <- Scissor(bulk_dataset, data, phenotype, alpha = 0.05, family = "cox", Save_file=paste0(wd,"Scissor_new_scGBM_APOE.RData"))

[1] "Performing quality-check for the correlations"

[1] "The five-number summary of correlations:"

0% 25% 50% 75% 100%

0.07894661 0.35465082 0.42092890 0.48272104 0.73445765

[1] "Perform cox regression on the given clinical outcomes:"

[1] "alpha = 0.05"

[1] "Scissor identified 524 Scissor+ cells and 1187 Scissor- cells."

[1] "The percentage of selected cell is: 21.757%"

bulk_survival$APOE_group_updated[bulk_survival$APOE_group_updated == "e3"] <- 0 bulk_survival$APOE_group_updated[bulk_survival$APOE_group_updated == "e2"] <- 0 bulk_survival$APOE_group_updated[bulk_survival$APOE_group_updated == "e2/e3"] <- 0 bulk_survival$APOE_group_updated[bulk_survival$APOE_group_updated == "e4_hz"] <- 1 bulk_survival$APOE_group_updated[bulk_survival$APOE_group_updated == "e4_ho"] <- 1 bulk_survival$APOE_group_updated <- as.numeric(bulk_survival$APOE_group_updated) apoe <- bulk_survival[,-c(1,2,3,4,6,7,8)] table(apoe) names(apoe) <- make.names(rownames(bulk_survival)) tag <- c("non-e4", "apoe4") infos4 <- Scissor(bulk_dataset, data, apoe, tag = tag, alpha = 0.001,family = "binomial", Save_file = paste0(wd, "Scissor_new_GBM_APOE_specific.RData"))

0% selected cells

infos4_g <- Scissor(bulk_dataset, data, apoe, tag = tag, alpha = seq(10,50,5)/1000,family = "gaussian", Save_file = paste0(wd, "Scissor_new_GBM_APOE_specific.RData"))

0.013% of selected cells (only scissor -)

objects are: sc_dataset = large Seurat (2.3 GB) bulk_dataset = large matrix (7337421 elements, 62 MB) bulk_survival = 153 obs. of 9 variables