knausb / vcfR

Tools to work with variant call format files
248 stars 54 forks source link

numeric sample names in read.vcfR #129

Closed jessepoland closed 5 years ago

jessepoland commented 5 years ago

Numeric sample names are appended with 'X'.

Example 001ET_D becomes "X001ET_D"

This issue is related to #2 for read.vcf https://github.com/knausb/vcfR/issues/2

knausb commented 5 years ago

Hi @jessepoland, that sounds like an R thing. But I'm not reproducing it.


library(vcfR)
#> 
#>    *****       ***   vcfR   ***       *****
#>    This is vcfR 1.8.0 
#>      browseVignettes('vcfR') # Documentation
#>      citation('vcfR') # Citation
#>    *****       *****      *****       *****
data("vcfR_test")
colnames(vcfR_test@gt)[-1] <- 1:3
colnames(vcfR_test@gt)[2] <- "001ET_D"
vcfR_test@gt
#>      FORMAT        001ET_D          2                3             
#> [1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
#> [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3"   "0/0:41:3"    
#> [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2"   "2/2:35:4"    
#> [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"    
#> [5,] "GT:GQ:DP"    "0/1:35:4"       "0/2:17:2"       "1/1:40:3"

orig_dir <- getwd()
setwd(tempdir())

write.vcf(vcfR_test, file = "vcfR_test.vcf.gz")

my_vcf <- read.vcfR("vcfR_test.vcf.gz", verbose = FALSE)
my_vcf@gt
#>      FORMAT        001ET_D          2                3             
#> [1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
#> [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3"   "0/0:41:3"    
#> [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2"   "2/2:35:4"    
#> [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"    
#> [5,] "GT:GQ:DP"    "0/1:35:4"       "0/2:17:2"       "1/1:40:3"

setwd(orig_dir)

Created on 2019-03-19 by the reprex package (v0.2.1)

Is it possible you're using an old version of vcfR? Can you modify the example so I can reproduce the behavior you're experiencing? Thanks!

jessepoland commented 5 years ago

@knausb , thanks. You are right. I looked at it a little closer and the import and creation of the vcfR object is working just fine. The problem was coming with the extraction of the genotypes.

geno = data.frame(extract.gt(vcf))

For some reason I was wrapping the extract.gt function into a data.frame. Duh. Sorry for the trouble, but thank you for looking into it!

geno = extract.gt(vcf)

works just fine!

knausb commented 5 years ago

No worries! Thanks for the feedback.