Closed guidohooiveld closed 1 year ago
Aha, I think I got it.
Apparently since v109 of Ensembl the genomes of 15 different mouse strains, in addition to the Genome Reference Consortium builds, are being annotated. https://www.ensembl.org/Mus_musculus/Info/Strains?db=core
Since I mapped my RNA-seq data on the latest GENCODE annotation and reference files (Release M32 (GRCm39)) I have to make use of the EnsDb
of the reference strain; that is $genome: GRCm39
. This turns out to be the 16th EnsDb
(i.e. AH109655
). In contrast, the first EnsDb
(AH109640
) is for Mouse 129S1/SvImJ (because $genome: 129S1_SvImJ_v1
). Etc.
Therefore, to refine my question, is it also possible to include in the query the genome assembly (thus on $genome)
?
> query(ah, c("EnsDb", "v109", "Mus musculus"))[16]
AnnotationHub with 1 record
# snapshotDate(): 2023-04-06
# names(): AH109655
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2022-10-30
# $title: Ensembl 109 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm39
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("109", "Annotation", "AnnotationHubSoftware", "Coverage",
# "DataImport", "EnsDb", "Ensembl", "Gene", "Protein", "Sequencing",
# "Transcript")
# retrieve record with 'object[["AH109655"]]'
>
> query(ah, c("EnsDb", "v109", "Mus musculus"))[1]
AnnotationHub with 1 record
# snapshotDate(): 2023-04-06
# names(): AH109640
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2022-10-30
# $title: Ensembl 109 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: 129S1_SvImJ_v1
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("109", "Annotation", "AnnotationHubSoftware", "Coverage",
# "DataImport", "EnsDb", "Ensembl", "Gene", "Protein", "Sequencing",
# "Transcript")
# retrieve record with 'object[["AH109640"]]'
>
Hi Guido!
yes, sorry, I now create also EnsDb
s for all strains - there have been requests for that - up to now I simply dropped them. Regarding the query, this is actually a function from AnnotationHub
, not ensembldb
, so I don't have any control over that function. But maybe that might be a nice issue/feature request for AnnotationHub
itself?
Note also that by adding the genome in the query
call you should get what you want (I suppose?):
> query(ah, c("EnsDb", "v109", "Mus musculus", "GRCm39"))
AnnotationHub with 1 record
# snapshotDate(): 2023-05-15
# names(): AH109655
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2022-10-30
# $title: Ensembl 109 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm39
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("109", "Annotation", "AnnotationHubSoftware", "Coverage",
# "DataImport", "EnsDb", "Ensembl", "Gene", "Protein", "Sequencing",
# "Transcript")
# retrieve record with 'object[["AH109655"]]'
I don't know how exactly query
works, but I assume its combining the search terms with a &
- and searching in any fields.
Yep, including the search term "GRCm39"
indeed easily allowed to find that specific EnsDb
. Why did I not think of that myself...
Thanks Jo!
Hi Johannes, I would like to make use of version 109 of the
EnsDb
for mouse. However, in contrast to v108, I now notice that multipleEnsDb
's (records) are available, which confuses me. So which one should I use c.q. is the analogous to the singleEnsDb
(record) present for v108?Thanks, Guido