Closed kokyriakidis closed 3 years ago
Yes, there is a way to do this that works with DESeq2. Start your bcbio runs in batches, and I'll work on adding an example to the bcbioRNASeq vignette. In the meantime, here's a rough draft of how you can do it:
library(bcbioRNASeq)
library(DESeq2)
## This will return a RangedSummarizedExperiment.
## Repeat stepwise in a loop for each of your bcbioRNASeq runs.
rse_c <- combine(bcb_1, bcb_2)
## You may need to round your counts to integer values before this step works (see the minimal example below).
dds <- DESeqDataSet(rse_c)
The code that handles this is defined in my basejump package, (see combine()
for details):
selectMethod(
f = "combine",
signature = signature(
x = "SummarizedExperiment",
y = "SummarizedExperiment"
)
)
I'll work on seeing if I can improve the combine()
method to support an unlimited number of objects. Currently it only works for merging 2 datasets at a time.
Thank you very much for the info!
I face another issue pointed out here: https://github.com/bcbio/bcbio-nextgen/issues/3033
Maybe do you have any clue?
Here's a minimal example that should work that you can test out:
library(bcbioRNASeq)
library(DESeq2)
data(bcb, package = "bcbioRNASeq")
bcb_x <- bcb
colnames(bcb_x) <- paste0(colnames(bcb_x), "_x")
bcb_y <- bcb
colnames(bcb_y) <- paste0(colnames(bcb_y), "_y")
rse_c <- combine(x = bcb_x, y = bcb_y)
class(rse_c)
# [1] "RangedSummarizedExperiment"
# attr(,"package")
# [1] "SummarizedExperiment"
dim(rse_c)
# [1] 100 12
# DESeq2 currently only supports integer counts.
counts <- counts(rse_c)
counts <- round(counts, digits = 0L)
mode(counts) <- "integer"
counts(rse_c) <- counts
dds <- DESeqDataSet(rse_c, design = ~ 1L)
class(dds)
# [1] "DESeqDataSet"
# attr(,"package")
# [1] "DESeq2"
# Next set your design formula with `design()` and eventually run `DESeq()`.
Do I lose something if I add the option tools_off: [upload_alignment]? Does bcbioRNASeq uses the bam files somehow?
@kokyriakidis No the bcbioRNASeq R package doesn't parse the BAM files currently. I tend to only use those for creating an IGV session.
@mjsteinbaugh I see that bcbioRNAseq has 3 templates QC, DE. Functional. WIll running fastrna-seq mode makes me be able to run only DE and Functional templates? Running in RNA-seq mode will just give info for the QC template, or it helps in the DE and Functional templates also?
Correct, the QC template only works for bcbio runs with alignment via STAR or HISAT2, since this outputs additional QC metrics. You can see these metrics in the colData()
of the object, when applicable. For differential expression, you're good to go. Just run this step:
# Assuming your dataset is named "bcbioRNASeq" here.
dds <- as(bcbioRNASeq, "DESeqDataSet")
@kokyriakidis I'm going to take a stab at improved support for combining bcbio-nextgen runs this month in bcbioRNASeq. I'll report back with an update.
Closing this issue for now, as this feature request has been added to the TODO list. Will try to add this in a future update.
@mjsteinbaugh
Having 199 samples to process and limited amount of storage 6TB, I would like to know if I can run bcbio several times and load and merge all bcbio runs in a single rda file?