zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

duplicated "variant.id" when multiple VCF files were imported by seqVCF2GDS #7

Closed zhengxwen closed 9 years ago

zhengxwen commented 9 years ago
library(SeqArray)

vcf.fn <- seqExampleFileName("vcf")

# convert
seqVCF2GDS(c(vcf.fn, vcf.fn), "tmp.gds")

f <- seqOpen("tmp.gds")

anyDuplicated(seqGetData(f, "variant.id"))
[1] 1349
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeqArray_1.8.0 gdsfmt_1.5.8  

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.30.1     XVector_0.8.0            GenomicRanges_1.20.5    
 [4] BiocGenerics_0.14.0      zlibbioc_1.14.0          GenomicAlignments_1.4.1 
 [7] IRanges_2.2.5            BiocParallel_1.2.14      BSgenome_1.36.2         
[10] GenomeInfoDb_1.4.1       tools_3.2.1              parallel_3.2.1          
[13] Biobase_2.28.0           DBI_0.3.1                lambda.r_1.1.7          
[16] futile.logger_1.4.1      digest_0.6.8             crayon_1.3.1            
[19] rtracklayer_1.28.6       S4Vectors_0.6.2          futile.options_1.0.0    
[22] bitops_1.0-6             RCurl_1.95-4.7           biomaRt_2.24.0          
[25] memoise_0.2.1            RSQLite_1.0.0            GenomicFeatures_1.20.1  
[28] Biostrings_2.36.1        Rsamtools_1.20.4         stats4_3.2.1            
[31] XML_3.98-1.3             VariantAnnotation_1.14.6