Open mschubert opened 3 months ago
Thanks for reporting. Is there no way to create or load a BSgenome
or other supported genome sequence object from AnnotationHub
? There should also be the twobit files from Ensembl accessible through AnnotationHub
- maybe these can be used? The advantage would be that you could ensure to use matching EnsDb
and genome sequence from the same Ensembl release. I'm just a bit worried that hg38 is not exactly identical to the GRCh38 version used by Ensembl...
I'm trying to use the VariantAnnotation package to annotate a VCF file from nf-core/sarek (UCSC style) using AnnotationHub
EnsDb
objects (Ensembl style), where I encountered the following issues:EnsDb
objects. I raised this (https://github.com/Bioconductor/VariantAnnotation/pull/74), but it will probably not get fixed; I worked around this by providing my own S4 method in my codeseqlevelsStyles()
do not matchFor point (2), I can change the
EnsDb
style toUCSC
:and then either load UCSC-style genome or change the Ensembl-style genome to UCSC:
However, this does not work, because the
genome()
of theEnsDb
object isGRCh38
, while the one of the assembly ishg38
, raising an assertion error in VariantAnnotation. So this needs an additional line changing the internal state of the S4 genome object (we can't change theEnsDb
object):I'm not sure what a good solution is here. It seems to be that a check if they genomes are identical (as performed by VariantAnnotation) is reasonable. I'm raising this issue more to document it rather than suggesting a change in ensembldb.