Error with summaryExtractTest() when using paired samples #68

Open greyajosh opened 2 years ago

greyajosh commented 2 years ago

Hi Wanding,

Thank you for the extensive and useful package! I have been trying to extract a group of probes specific to a contract variable using the DML() function followed by summaryExtractTest(). I am using paired patient samples and using blocking in my experimental design to account for the patient to patient variation and using race as the contrast variable for which I want to extract probes. I have 41 samples from 32 patients of 2 races. It works with the example data and with another set I have that does not require blocking. This is the error I encounter:

`data <- SummarizedExperiment(assays = as.matrix(na.omit(betasFM[grep('cg', rownames(betasFM))[1:1000], ])), metadata = fcc)

data@colData@listData <- as.list(fcc)

data@colData@listData[["Race"]] <- as.factor(data@colData@listData[["Race"]])

data@colData@listData[["patient"]] <- as.factor(data@colData@listData[["patient"]])

colData(data)$Race <- relevel(factor(colData(data)$Race), "White")

colData(data)$patient <- relevel(factor(colData(data)$patient), "B.MP321")

smryF = DML(data, ~Race + patient)

test_result = summaryExtractTest(smryF)

Error in [.data.frame(est, , paste0("Est_", cont, lvs), drop = FALSE) : undefined columns selected`

I also tried to pass it through the code I found in sesame/dm.R which works up until this point where I obtain the following error:

`est <-, lapply(smry, function(x) { x$coefficients[,'Estimate']; })))) rownames(est) <- names(smry) colnames(est) <- paste0("Est_", colnames(est)) est$ProbeID <- names(smry) pvals <-, lapply(smry, function(x) { x$coefficients[,"Pr(>|t|)"] })))) rownames(pvals) <- names(smry) colnames(pvals) <- paste0("Pval", colnames(pvals)) f_pvals <-, lapply(smry, function(x) { x$Ftest["pval",,drop=FALSE] })) rownames(f_pvals) <- names(smry) colnames(fpvals) <- paste0("FPval", colnames(fpvals)) contr2lvs <- attr(smry, "contr2lvs") effsize <-, lapply(names(contr2lvs), function(cont) { lvs <- contr2lvs[[cont]] lvs <- lvs[2:length(lvs)] apply(est[, paste0("Est", cont, lvs),drop=FALSE], 1, function(x) { max(x,0) - min(x,0) }) }))

Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function '': undefined columns selected`

Is this something to do with the patient blocking or am I naively missing something? It may also be worth noting that I am able to use that same DMLSummary object for obtaining DMRs through DMR() successfully.

Thanks again, Josh

Below is my session info.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

zwdzwd commented 2 years ago

My guess is our code is not handling the missing levels for some CGs here. Do you mind sharing me a copy of your smry object? You can send to if possible. That will help me debug. Thanks!

zwdzwd commented 2 years ago

I took a look I think this is due to the reference level in "patient" being nonexistent.

colData(data)$patient <- relevel(factor(colData(data)$patient), "B.MP321")

I included a line in the updated code to circumvent this error to pop up. but I think setting the right reference should solve the problem.

gbc529 commented 2 years ago

Hello, Thank you both for sharing the original error and solution! Unfortunately, I am running into the same error but when I look at the smry object generated by my data it appears that the reference level is set for the factors I am considering, specifically "Sample.Region" and "CUB.ID". I have 224 samples that I am running and I am looking to use Sample.Region as the contrast variable. The error I am getting, my input, as well as sessionInfo is below:

#Run preprocessing using opensesame. Output of Beta-values
taka_betas <- openSesame(".")

#Convert metadata file with desired columns + beta-values to sigDF
pdata <- read.csv("/path/to/project/meta/data/meta.csv")
pdata <- column_to_rownames(pdata_noBMI, "IDAT")
pdata_noBMI<- pdata[,1:5]
betas_t <- t(taka_betas)
se <- SummarizedExperiment(t(betas_t), colData = pdata_noBMI)

## Modeling Differential Methylation____:

meta=dplyr::select(as_tibble(colData(se)), CUB.ID, Age, Sample.Region, Case.Control, Race)


#Set all variables to factors for analysis
meta$CUB.ID= relevel(factor(meta$CUB.ID), ref='CUB-006')
meta$Age= relevel(factor(meta$Age), ref='53')
meta$Sample.Region= relevel(factor(meta$Sample.Region), ref='T') 
meta$Race= relevel(factor(meta$Race), ref='Black or African American') 
meta$Case.Control = relevel(factor(meta$Case.Control), ref = 'case')
betas <- assay(se)

#Selected CUB.ID in order to compare based on where the sample was taken and if it comes from the same patient
ok1 = checkLevels(betas,meta$CUB.ID)
ok2 = checkLevels(betas,meta$Sample.Region)
ok4 = checkLevels(betas,meta$Age)
ok5 = checkLevels(betas,meta$Race)
ok6 = checkLevels(betas,meta$Case.Control)

#Check reads w/t NA for CUB.ID

#Filter out for only viable probes (i.e one's with no NA present)
betas_filtered <- betas[ok1&ok2&ok4&ok5&ok6, ]

#Count number of remaining probes

#Summarize data (START HERE)
smry = DML(betas_filtered, ~Sample.Region+CUB.ID, meta=meta) #Time consuming
smry[[1]] #inspects summary results
res <- summaryExtractTest(smry) #creates df of DML data above

Error in `[.data.frame`(est, , paste0("Est_", cont, lvs), drop = FALSE) : 
  undefined columns selected

Below is my sessionInfo:

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS:   /hpc/software/R/4.1.1/lib64/R/lib/
LAPACK: /hpc/software/R/4.1.1/lib64/R/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              

Thank you for your help!

zwdzwd commented 2 years ago

can you try the latest sesame v1.15.4 and see if this problem still exist? We have some update lately. @gbc529 thanks.

gbc529 commented 2 years ago

Hello Dr. Zhou,

Updating to the latest version solved my issue. However, now that I have moved onto the modeling, am not sure how I would like to run comparisons, and can not find an answer in the documentation. I will put my question in the discussions section above. Thank you @zwdzwd