zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
101 stars 25 forks source link

snpgdsGDS2BED fails to read GDS file made by seqVCF2GDS #23

Closed timflutre closed 7 years ago

timflutre commented 7 years ago

Here is a minimal reproducible example:

con <- url("https://raw.githubusercontent.com/timflutre/rutilstimflutre/master/inst/extdata/example.vcf")
vcf.txt <- readLines(con)
close(con)
vcf.file <- "example.vcf.gz"
writeLines(vcf.txt, gzfile(vcf.file))
close(gzfile(vcf.file))

library(SeqArray)
library(SNPRelate)
gds.file <- "example.gds"
SeqArray::seqVCF2GDS(vcf.fn=vcf.file,
                     out.fn=gds.file)

bed.file <- "example.bed"
SNPRelate::snpgdsGDS2BED(gdsobj=gds.file,
                         bed.fn=bed.file)

The last command returns:

Error in snpgdsOpen(gdsobj) : 
  Invalid SNP GDS file: 'FileFormat' should be 'SNP_ARRAY'.

Here is my sessionInfo():

R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SNPRelate_1.8.0 SeqArray_1.14.1 gdsfmt_1.10.1  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8                AnnotationDbi_1.36.0      
 [3] XVector_0.14.0             GenomicAlignments_1.10.0  
 [5] GenomicRanges_1.26.1       BiocGenerics_0.20.0       
 [7] zlibbioc_1.20.0            IRanges_2.8.1             
 [9] BiocParallel_1.8.1         BSgenome_1.42.0           
[11] lattice_0.20-34            GenomeInfoDb_1.10.1       
[13] tools_3.3.3                SummarizedExperiment_1.4.0
[15] parallel_3.3.3             grid_3.3.3                
[17] Biobase_2.34.0             DBI_0.5-1                 
[19] digest_0.6.10              crayon_1.3.2              
[21] Matrix_1.2-8               rtracklayer_1.34.1        
[23] S4Vectors_0.12.0           bitops_1.0-6              
[25] biomaRt_2.30.0             RCurl_1.95-4.8            
[27] memoise_1.0.0              RSQLite_1.1-2             
[29] compiler_3.3.3             GenomicFeatures_1.26.0    
[31] Biostrings_2.42.0          Rsamtools_1.26.1          
[33] XML_3.98-1.5               stats4_3.3.3              
[35] VariantAnnotation_1.20.2  
zhengxwen commented 7 years ago

Please use snpgdsVCF2GDS() to convert the VCF file to a SNP GDS file (SNP GDS file is different from SeqArray GDS file).

library(SNPRelate)
gds.file <- "example.gds"
vcf.file <- "example.vcf.gz"

SNPRelate::snpgdsVCF2GDS(vcf.file, gds.file)

bed.file <- "example"
SNPRelate::snpgdsGDS2BED(gdsobj=gds.file, bed.fn=bed.file)
timflutre commented 7 years ago

Great, thanks!

ps: it would be great if snpgdsVCF2GDS could have an argument specifying a subset of the chromosomes to convert