Closed AR-Shicheng closed 2 weeks ago
Pretty interesting, the most important ENSEMBLE 88 is missing.
> library(biomaRt)
Warning: program compiled against libxml 210 using older 209
>
> listEnsembl()
biomart version
1 genes Ensembl Genes 112
2 mouse_strains Mouse strains 112
3 snps Ensembl Variation 112
4 regulation Ensembl Regulation 112
>
> listEnsemblArchives()
name date url version
1 Ensembl GRCh37 Feb 2014 https://grch37.ensembl.org GRCh37
2 Ensembl 112 May 2024 https://may2024.archive.ensembl.org 112
3 Ensembl 111 Jan 2024 https://jan2024.archive.ensembl.org 111
4 Ensembl 110 Jul 2023 https://jul2023.archive.ensembl.org 110
5 Ensembl 109 Feb 2023 https://feb2023.archive.ensembl.org 109
6 Ensembl 108 Oct 2022 https://oct2022.archive.ensembl.org 108
7 Ensembl 107 Jul 2022 https://jul2022.archive.ensembl.org 107
8 Ensembl 106 Apr 2022 https://apr2022.archive.ensembl.org 106
9 Ensembl 105 Dec 2021 https://dec2021.archive.ensembl.org 105
10 Ensembl 104 May 2021 https://may2021.archive.ensembl.org 104
11 Ensembl 103 Feb 2021 https://feb2021.archive.ensembl.org 103
12 Ensembl 102 Nov 2020 https://nov2020.archive.ensembl.org 102
13 Ensembl 101 Aug 2020 https://aug2020.archive.ensembl.org 101
14 Ensembl 100 Apr 2020 https://apr2020.archive.ensembl.org 100
15 Ensembl 99 Jan 2020 https://jan2020.archive.ensembl.org 99
16 Ensembl 98 Sep 2019 https://sep2019.archive.ensembl.org 98
17 Ensembl 97 Jul 2019 https://jul2019.archive.ensembl.org 97
18 Ensembl 80 May 2015 https://may2015.archive.ensembl.org 80
19 Ensembl 77 Oct 2014 https://oct2014.archive.ensembl.org 77
20 Ensembl 75 Feb 2014 https://feb2014.archive.ensembl.org 75
21 Ensembl 54 May 2009 https://may2009.archive.ensembl.org 54
current_release
Is there a way for us to build ENSEMBLE 88 ourselves? If so, how can we do it?
Ensembl keeps each release available for 5 years. A few selected releases are retained for longer, but in most cases once 5 years has passed it is deemed out of date and removed. Ensembl 88 is from May 2017 and was removes ~ 2 years ago. There are some more details on the archive policies at https://www.ensembl.org/info/website/archives/index.html
biomaRt is only an interface to query to databases Ensembl makes available, and so you can't access release 88.
In theory you could potentially build your own version from the original source data, available from https://ftp.ensembl.org/pub/release-88/ However I don't think Ensembl provide any instructions on how to do this and it will be a very difficult task.
I would ask why using such an old version is important. If there's a really good reason, maybe you can get the information you need from those files on the FTP site, rather than using BioMart. If not, then perhaps using a more recent version of the annotation data would be fine.
Dear Mike,
If you could prepare an ENSEMBLE 88 dataset, it would provide tremendous help to the community. As you may know, the GTEx data is crucial for our research, and their results, particularly at the transcript level, are based on ENSEMBLE 88 and have not been updated to the latest ENSEMBLE versions.
Without ENSEMBLE 88, many analyses might face significant issues, including conflicts or misleading results, which could lead to serious reproducibility concerns.
Best regards,
Shicheng
From: Mike Smith @.> Sent: Tuesday, July 23, 2024 2:00 AM To: grimbough/biomaRt @.> Cc: Shicheng Guo @.>; Author @.> Subject: Re: [grimbough/biomaRt] ENSEMBLE 88 (Issue #107)
Ensembl keeps each release available for 5 years. A few selected releases are retained for longer, but in most cases once 5 years has passed it is deemed out of date and removed. Ensembl 88 is from May 2017 and was removes ~ 2 years ago. There are some more details on the archive policies at https://www.ensembl.org/info/website/archives/index.html
biomaRt is only an interface to query to databases Ensembl makes available, and so you can't access release 88.
In theory you could potentially build your own version from the original source data, available from https://ftp.ensembl.org/pub/release-88/ However I don't think Ensembl provide any instructions on how to do this and it will be a very difficult task.
I would ask why using such an old version is important. If there's a really good reason, maybe you can get the information you need from those files on the FTP site, rather than using BioMart. If not, then perhaps using a more recent version of the annotation data would be fine.
— Reply to this email directly, view it on GitHubhttps://github.com/grimbough/biomaRt/issues/107#issuecomment-2244649254, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BG5ET37YTKEAUWKEW64FH5TZNYLRZAVCNFSM6AAAAABLDNBMZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBUGY2DSMRVGQ. You are receiving this because you authored the thread.
I'm not going to create my own instance of BioMart. I don't work for Ensembl, nor do I have the time or resources to maintain my own BioMart server.
However, you could potentially use another source of annotation in Bioconductor. The ensembldb packages (https://bioconductor.org/packages/release/bioc/html/ensembldb.html) let you download snapshots of each Ensembl release to work with locally.
BiocManager::install('AnnotationHub')
ah <- AnnotationHub::AnnotationHub()
## search for the Human Ensembl 88 database
query(ah, pattern = c("Ensembl 88", "Sapiens"))
AnnotationHub with 1 record
# snapshotDate(): 2024-04-30
# names(): AH53715
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2017-04-05
# $title: Ensembl 88 EnsDb for Homo Sapiens
# $description: Gene and protein annotations for Homo Sapiens based on Ensembl version 88.
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein", "Annotation", "88", "AHEnsDbs")
# retrieve record with 'object[["AH53715"]]'
## This finds only one record, and gives instruction is on how to retrieve it
## Downloading might take quite a while
ens_88 <- ah[["AH53715"]]
ens_88
# EnsDb for Ensembl:
# |Backend: SQLite
# |Db type: EnsDb
# |Type of Gene ID: Ensembl Gene ID
# |Supporting package: ensembldb
# |Db created by: ensembldb package from Bioconductor
# |script_version: 0.3.1
# |Creation time: Thu Jun 15 08:50:24 2017
# |ensembl_version: 88
# |ensembl_host: localhost
# |Organism: homo_sapiens
# |taxonomy_id: 9606
# |genome_build: GRCh38
# |DBSCHEMAVERSION: 2.1
# | No. of genes: 64592.
# | No. of transcripts: 219063.
# |Protein data available.
You'll need to look at the manual for ensembldb to figure out how to work with that object and extract the data you want, but it should match the Ensembl release you want to work with.
Hi Mike,
I am wondering why only ENSEMBLE 88 is missed from BiomaRt package?
Thanks,
Shicheng