zhanxw / seqminer

Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R
http://zhanxw.github.io/seqminer/
Other
30 stars 12 forks source link

Allele T gets converted to TRUE #13

Open zx8754 opened 4 years ago

zx8754 commented 4 years ago

Using command line:

# tabix (htslib) 1.9
$ tabix  myfile.gz 10:85547273-85547273 | cut -f1-5
# 10:85547273:C:T 10      85547273        C       T

vs using seqminer:

# seqminer_8.0
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)

tabixSubset <- tabix.read.table("myfile.gz", "10:85547273-85547273")
tabixSubset[, 1:5]
#               V1 V2       V3 V4   V5
#1 10:85547273:C:T 10 85547273  C TRUE
asaksager commented 2 years ago

This is still an issue.

My solution was to change it after extraction:

mysubset <- as_tibble(tabix.read.table("myfile.gz", "10:85547273-85547273")) %>% 
                      mutate(across(where(is.logical),as.character)) %>%   
                      mutate(across(.cols = everything(), .
                                    fns =~ str_replace_all(string =., pattern = "TRUE", replacement = "T")))