grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

getBM returns logical minor_allele column when minor_allele = T #12

Open sinarueeger opened 5 years ago

sinarueeger commented 5 years ago

Hi, I ran into a case, where the minor_allele of a SNP returned is coded as TRUE. The example below only queries one SNP rs1528723 which has T as the minor allele (dbSNP). My guess: the minor allele column is never forced to be a character, hence R automatically assumes a T to be a TRUE.


library(biomaRt)
snp_ensembl <- useEnsembl(biomart = "snp", dataset = "hsapiens_snp", GRCh = 37)
chr <- 8
pos <- 35127386
out <- getBM(
  attributes = c("refsnp_id", 
                 "minor_allele"),
  filters = c("chr_name", "start", "end"), 
  values = list(chr, pos, pos), 
  mart = snp_ensembl)
out
#>   refsnp_id minor_allele
#> 1 rs1528723         TRUE

Created on 2019-05-07 by the reprex package (v0.2.1)

grimbough commented 5 years ago

Thanks for the report. I remember running into this once before, and it's a super annoying edge case. If your query returns any alleles that aren't T then it'll work fine, and because a query can return any combination of attributes in any order its really hard to see the class for the column beforehand.

I didn't manager to come up with a solution that wasn't really heavy handed before i.e. make everything a character even if it's a coordinate etc, but I'll have another think.

sinarueeger commented 5 years ago

I understand that this is an edge case since the queries are usually for more than 1 SNP. Thanks for looking into this! For now I can just force it to a character myself.