grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

`pberghei_eg_gene` geneset have all genes missing as compared to PlasmoDB #110

Open Rohit-Satyam opened 3 weeks ago

Rohit-Satyam commented 3 weeks ago

I made a strange observation. When I see the PlasmoDB gene Database, I see that there are 5254 genes but when using Biomart, I get only 4903 genes. Various characterized genes such as "PBANKA_0100600", "PBANKA_0102900" are missing from it while various genes such as PBANKA_000970, PBANKA_000980 that are absent from PlasmoDB and Uniprot are present. I am using biomaRt 2.60.1. Can this be fixed?

mart = "protists_mart"
gset = "pberghei_eg_gene"
 ensembl_mart <- biomaRt::useEnsemblGenomes(biomart = mart, dataset = gset)
gene_names <- biomaRt::getBM(attributes = "ensembl_gene_id", mart = ensembl_mart)

Various weird gene IDs appear in place of Ensemble gene Ids as well in first 65 rows.

image

The version is also missing and only PBANKA01 is written.

> ensembl_mart@version
[1] ""

Edit1: I just realised none of the gene ID maps

jVenn_chart (3)

Rohit-Satyam commented 3 weeks ago

@grimbough Can you please take this as a priority since I have heard in parasite meetings and the disclaimer on the website that the database might cease to exist after 14th September!!

Rohit-Satyam commented 2 weeks ago

@jwokaty @dtenenba @vobencha anyone?

grimbough commented 2 weeks ago

The biomaRt package is just an interface to the data hosted the Ensembl BioMart service. I have no control over the content of that service. There's some information on the assembly and genome build for this organism used by Ensembl at https://protists.ensembl.org/Plasmodium_berghei/Info/Annotation/ My guess would be that this is outdated compared to the version provided by PlasmoDB.

If you think this is an issue, the best place to contact is the Ensembl Helpdesk at https://protists.ensembl.org/Help/Contact They should be able to provide more information on how genome builds are choose and whether there is an update path for this specific organism.

Rohit-Satyam commented 6 days ago

Thanks ka @grimbough. I was under the impression that you guys are friends with Ensembl. I will write to them now.

grimbough commented 6 days ago

I work for the same organisation as the Ensembl team, but we're in different departments and different countries, and mostly interact with the folks who maintain the Ensembl BioMart instance. I don't hold any influence over the choice of data or genomes that get included in Ensembl.