Open danielcgingerich opened 3 years ago
Hi, sorry for the late reply!
According to the error message it seems that the exon identifiers in the GTF file are not unique - not much we can do about. Generally, creating EnsDb
objects/databases from GTF is tricky as the GTF file format is not too standardized. Creating databases from GTF files from Ensembl should work - for the ones from Gencode I don't know.
Note that there are pre-build annotation resources for all Ensembl releases:
> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2020-11-02
> query(ah, "EnsDb.Hsapiens.v98")
AnnotationHub with 1 record
# snapshotDate(): 2020-11-02
# names(): AH75011
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2019-05-02
# $title: Ensembl 98 EnsDb for Homo sapiens
# $description: Gene and protein annotations for Homo sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("98", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
# "Protein", "Transcript")
# retrieve record with 'object[["AH75011"]]'
Since the Gencode 32 is based on Ensembl 98 - would this work for you?
Someone please explain to me how to get the annotation from GRCh38 2020A and convert to a GRanges object
Why?