Closed lukeholman closed 5 years ago
A few different points to address:
Thanks for pointing out the erroneous example. biomaRt is an now an old package and the underlying resources it accesses have changed a bit over the years. I expect these examples worked and made sense at some point, but then started failing and a previous package maintainer commented them out.
I have updated listDatasets
, listFilters
and listAttributes
with examples that now work, and I'll keep this issue open to remind me to work through the other help pages too.
The biomartr package is completely unaffiliated from biomaRt. Despite the name, I believe biomartr was developed to provide a unified interface to a number of different online data stores, not just those running BioMart. If you have questions regarding that package you're better off contacting the package maintainer directly at https://github.com/ropensci/biomartr
If I was trying to list the GO IDs associated with a set of Entrez IDs in biomaRt I would do something like:
bee_mart = useMart("metazoa_mart",
host="metazoa.ensembl.org",
dataset="amellifera_eg_gene")
## substitute for the vector of 13,000 ids
entrez_gene_ids <- c("100576508", "107965534", "726750", "725548", "100576425")
getBM(mart = bee_mart,
attributes = c("entrezgene", "go_id"),
filters = "entrezgene",
values = entrez_gene_ids)
You can omit the the filters
and values
arguments to get everything available in the Ensembl mart.
However, looking at the results, I wonder if part of the reason behind seeing only 300 GO terms is that there is not a very comprehensive mapping between Entrez Gene IDs and Ensembl Gene IDs. The following queries will return the total number of records in Ensembl with an Entrez ID, and then the number of records that don't have an Entrez ID.
> nrow(getBM(mart = mart, attributes = "ensembl_gene_id",
filters = "with_entrezgene", values = TRUE))
[1] 6028
> nrow(getBM(mart = mart, attributes = "ensembl_gene_id",
filters = "with_entrezgene", values = FALSE))
[1] 9825
Clearly the total number of records is in a similar ball park to your 13,000 Entrez IDs, but only ~40% have a mapping between the two ID sets. As for why this is, you'd need to ask Ensembl, but you might have more luck either getting GO terms from a source that used Entrez naively, or annotating your list of genes with Ensembl IDs directly.
In the above example, if I didn't know the specific names of the filters I needed, I could use either listFilters()
or searchFilters()
to try and find them e.g.
> searchFilters(mart = bee_mart, pattern = "entrez")
name description
14 with_entrezgene With NCBI gene ID(s)
41 entrezgene NCBI gene ID(s) [e.g. 100049551]
Analogous functions are also present for datasets and attributes.
Hey there, The example for biomaRt:: listDatasets does not work. The example reads:
Here's what I get:
I'm not sure why you suggest calling grep on a dataframe, and then indexing a dataframe with a single integer (not e.g.
marts[,index]
) - perhapslistMarts()
used to return a character vector, and now it has changed?Also, maybe don't use commenting in the examples - it makes it harder to copy-paste them. I guess you wanted to pass the CRAN checks, but you don't need the #s for this.
I'd also really love better worked example. Basically, I would like to get a list of all the gene-GO associations for the species Apis mellifera, and there seems to be no instructions on how to do this. I'm not sure if the pacakges
biomaRt
andbiomartr
are related, but the following, wherex
is a list of >13,000 Entrez gene IDs for this species, yields only ~300 GO terms, which is mysteriously wrong:Maybe I am using the wrong filter, but it doesn't seem possible to get a list of filters (and then, why does it work a little bit?). Perhaps listing the possible filters in the help file would be a good idea?