hbc / bcbioRNASeq

R package for bcbio RNA-seq analysis.
https://bioinformatics.sph.harvard.edu/bcbioRNASeq
GNU Affero General Public License v3.0
58 stars 21 forks source link

Error creating DESeqAnalysis object #167

Closed kokyriakidis closed 3 years ago

kokyriakidis commented 3 years ago

Hello @mjsteinbaugh

I get this error. How can I fix this?

Error in validObject(.Object) : invalid class "DESeqAnalysis" object: isSubset(x = c("geneID", "geneName"), y = names(mcols(rowRanges(data)))) is not TRUE. Cause: 'c("geneID", "geneName")' has elements not in 'names(mcols(rowRanges(data)))': geneID, geneName isSubset(x = c("geneID", "geneName"), y = names(mcols(rowRanges(transform)))) is not TRUE. Cause: 'c("geneID", "geneName")' has elements not in 'names(mcols(rowRanges(transform)))': geneID, geneName

I use the following command:

object_da <- DESeqAnalysis(object_dds, object_dt, object_res_list_unshrunken, lfcShrink = object_res_list_shrunken)

EDIT:

I had to use these lines of code in order to get it work

rowRanges <- emptyRanges(names = rownames(data))
mcols(rowRanges)[["geneID"]] <- paste0("id", seq_len(length(rowRanges)))
mcols(rowRanges)[["geneName"]] <- paste0("name", seq_len(length(rowRanges)))
rowRanges(data) <- rowRanges

This was not the case for other bcbio runs I tried

mjsteinbaugh commented 3 years ago

Hi @kokyriakidis , I'm working on relaxing the requirement for gene identifiers (e.g. ENSG00000000003) and gene symbols (e.g. TSPAN6) to be defined in the rowRanges of the DESeqDataSet. Typically I define the genome in the bcbioRNASeq() call, which then fetches the genome annotations via AnnotationHub internally using the makeGRangesFromEnsembl() function.

For example:

library(bcbioRNASeq)
bcb <- bcbioRNASeq(
    organism = "Homo sapiens",
    genomeBuild = "GRCh38",
    ensemblRelease = 100L
)

Internally, this function hands off to makeGRangesFromEnsembl:

library(basejump)
gr <- makeGRangesFromEnsembl(
    organism = "Homo sapiens",
    genomeBuild = "GRCh38",
    release = 100L
)
class(gr)
## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"

The identifiers and names are defined in the mcols of the GRanges object:

> mcols(gr)[["geneID"]]
character-Rle of length 68008 with 68008 runs
  Lengths:                 1                 1 ...                 1
  Values : "ENSG00000000003" "ENSG00000000005" ...         "LRG_999"
> mcols(gr)[["geneName"]]
character-Rle of length 68008 with 68001 runs
  Lengths:          1          1          1 ...          1          1
  Values :   "TSPAN6"     "TNMD"     "DPM1" ...    "CCND3"      "CIC"
mjsteinbaugh commented 3 years ago

Resolved in DESeqAnalysis 0.3.12 update. Thanks for posting this!