Closed rbutleriii closed 2 months ago
I'm not sure why this is happening, but it isn't due to the mix of numeric and character names for the chromosomes. If you combine just a subset of possible options it seems to work and returns numbers consistent with what you've shown:
library(biomaRt)
mouse <- useEnsembl(biomart = "genes", dataset = "mmusculus_gene_ensembl")
b <- getBM(mart = mouse,
filter = "chromosome_name",
values = c("19", "MT"),
attributes = c(
"ensembl_gene_id",
"external_gene_name",
"chromosome_name",
"hsapiens_homolog_ensembl_gene",
"hsapiens_homolog_associated_gene_name"
), useCache = FALSE
)
table(b$chromosome_name)
#>
#> 19 MT
#> 1486 37
It seems like it maybe related to the number of values you're using to filter on, although I'll agree it's suspicious that it breaks right at the divide between numbers and letters.
If I run the same query with all chromosome names in the Ensembl web interface I also find the "X", "Y", "MT" results missing. This suggests it's an issue with the Ensembl BioMart itself, rather than the biomaRt package. There's very little I can do if the server doesn't send a complete set of results back.
As a work around you can always try running this two separate queries and combining the results - unsatisfactory but it looks like it works. I'd also suggest contacting the Ensembl helpdesk (https://www.ensembl.org/Help/Contact) and reporting the problem. Feel free to link to this GitHub issue to demonstrate the problem.
If I try to filter my results in
getBM
bychromosome_name
, it works for numbers only, or letters only, but not both (i.e. 1-19 with X and Y):