grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

uniqueRows argument in getBM() not working #11

Closed rikrdo89 closed 5 years ago

rikrdo89 commented 5 years ago

Hi, I used a list of ensembl IDs containing multiple duplicates, and even though I set the attribute uniqueRows=F, getBM() returns the mapping on unique ensemble IDs and removes the duplicates. Can someone please check if this is a bug in the current package? Thanks. I am using biomaRt version 2.38.

grimbough commented 5 years ago

Unfortunately, this behaviour where duplicate values are reduced to a single value is something that happens server side, and so it's not possible to force biomaRt to return a match to every entry in your set of values.

The uniqueRows argument can be used in situations where the returned data.frame will contain duplicated rows. This doesn't happen often, but because Ensembl BioMart is a transcript-centric database, you can sometimes construct queries where a value is returned per transcript, even if that isn't really what you want. For example, consider the following two examples.

library(biomaRt)
mart <- useEnsembl('ensembl', dataset = 'hsapiens_gene_ensembl')

getBM(attributes = c('ensembl_gene_id', 'transcript_source'),
      filters = "ensembl_gene_id",
      values = "ENSG00000094804",
      mart = mart,
      uniqueRows = FALSE)
  ensembl_gene_id transcript_source
1 ENSG00000094804    ensembl_havana
2 ENSG00000094804    ensembl_havana
3 ENSG00000094804            havana
4 ENSG00000094804            havana
5 ENSG00000094804            havana
6 ENSG00000094804            havana
7 ENSG00000094804            havana
8 ENSG00000094804            havana
getBM(attributes = c('ensembl_gene_id', 'transcript_source'),
      filters = "ensembl_gene_id",
      values = "ENSG00000094804",
      mart = mart,
      uniqueRows = TRUE)
  ensembl_gene_id transcript_source
1 ENSG00000094804    ensembl_havana
2 ENSG00000094804            havana

If you have a specific use case where you want to retain the duplicated IDs feel free to ask a question with some more details at https://support.bioconductor.org and I'm sure someone will suggest a solution using either biomaRt or one of the other annotation packages.