nhejazi / biotmle

:package: :microscope: R/biotmle: Targeted Learning with Moderated Statistics for Biomarker Discovery
https://code.nimahejazi.org/biotmle/
Other
4 stars 2 forks source link

biomarkertmle fails on data transposition #42

Closed BiaoLiu2017 closed 6 years ago

BiaoLiu2017 commented 6 years ago

Hi, when I tried the script on the page https://code.nimahejazi.org/biotmle/articles/rnaseqProcessing.html the code is as follows:

se <- SummarizedExperiment(assays = list(counts = DataFrame(ngs_data)),
                           colData = DataFrame(design))
se
rnaseqTMLEout <- biomarkertmle(se = se,
                               varInt = 1,
                               type = "exposure",
                               ngscounts = TRUE,
                               parallel = TRUE,
                               family = "gaussian",
                               g_lib = c("SL.mean", "SL.glm", "SL.randomForest"),
                               Q_lib = c("SL.mean", "SL.glm", "SL.randomForest",
                                         "SL.nnet")
                              )

But it had a bug report:

Error in t.default(assay(se)) : argument is not a matrix
Calls: biomarkertmle -> as.data.frame -> t -> t.default
Execution halted

And I found the R script on Bioconductor is as follow. The code has been commented out.

se <- SummarizedExperiment(assays = list(counts = DataFrame(ngs_data)),
                           colData = DataFrame(design))
se
#  rnaseqTMLEout <- biomarkertmle(se = se,
#                                 varInt = 1,
#                                 type = "exposure",
#                                 ngscounts = TRUE,
#                                 parallel = TRUE,
#                                 family = "gaussian",
#                                 g_lib = c("SL.mean", "SL.glm", "SL.randomForest"),
#                                 Q_lib = c("SL.mean", "SL.glm", "SL.randomForest",
#                                           "SL.nnet")
#                                )
#  head(rnaseqTMLEout@tmleOut$E[, seq_len(6)])
data(rnaseqtmleOut)
head(rnaseqTMLEout@tmleOut$E[, seq_len(6)])

So, how can I fix the bug?

nhejazi commented 6 years ago

Thanks for your bug report. It will likely take me a bit of time to investigate this as I have not had the opportunity to maintain the biotmle R package well.

The error appears to stem from the fact that some piece of the code for cleaning/re-shaping the input data in the biomarkertmle wrapper function attempts to use t() to transpose the data, which apparently fails when the data is not of class matrix --- in this case, it appears to be of class data.frame (hence the mention of as.data.frame in the traceback shown).

With regard to the vignette source code (on Bioconductor and elsewhere): The code chunk you've excerpted is set to eval = FALSE in order to avoid actually running the biomarkertmle function, which will cause the build time of the vignette to exceed the time allotted by the Bioconductor (and, I believe, CRAN as well) build/check systems. This should not be a concern, as the relevant code is still subject to unit tests that (seem to) verify its integrity.

nhejazi commented 6 years ago

It appears that the bug reported here originates from a change in the expected behavior of SummarizedExperiment::assay() (likely at some point during the transition to Bioconductor 3.7 recently) where the class of the object returned by assay() is DataFrame rather than the expected Matrix. This causes a failure in both t() and other downstream function calls meant to prepare the data for invoking the TML estimation procedure. This has been fixed in 2e78b7f in #43, which will likely be merged very soon.

nhejazi commented 6 years ago

Resolved by #43.