zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

Odd behavior of parallel seqVCF2GDS() #32

Closed AAvalos82 closed 6 years ago

AAvalos82 commented 6 years ago

I am noticing an issue when I run seqVCF2GDS(). If I run it in parallel I loose the trailing 7 SNPs in my data set. However when I run it serially all SNPs are accounted for.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_1.5         RColorBrewer_1.1-2   adegenet_2.1.1       ade4_1.7-11          GenomicRanges_1.32.6 GenomeInfoDb_1.16.0 
 [7] IRanges_2.14.10      S4Vectors_0.18.3     BiocGenerics_0.26.0  SNPRelate_1.14.0     SeqArray_1.20.1      gdsfmt_1.16.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18           ape_5.1                lattice_0.20-35        deldir_0.1-15          Biostrings_2.48.0      gtools_3.8.1          
 [7] assertthat_0.2.0       digest_0.6.15          mime_0.5               R6_2.2.2               plyr_1.8.4             coda_0.19-1           
[13] ggplot2_3.0.0          pillar_1.3.0           zlibbioc_1.26.0        rlang_0.2.2            lazyeval_0.2.1         spdep_0.7-8           
[19] rstudioapi_0.7         gdata_2.18.0           vegan_2.5-2            gmodels_2.18.1         Matrix_1.2-14          splines_3.5.1         
[25] stringr_1.3.1          igraph_1.2.2           RCurl_1.95-4.11        munsell_0.5.0          shiny_1.1.0            compiler_3.5.1        
[31] httpuv_1.4.5           pkgconfig_2.0.2        mgcv_1.8-24            htmltools_0.3.6        tidyselect_0.2.4       tibble_1.4.2          
[37] GenomeInfoDbData_1.1.0 expm_0.999-2           permute_0.9-4          crayon_1.3.4           dplyr_0.7.6            later_0.7.3           
[43] MASS_7.3-50            bitops_1.0-6           grid_3.5.1             nlme_3.1-137           spData_0.2.9.3         xtable_1.8-2          
[49] gtable_0.2.0           scales_1.0.0           stringi_1.1.7          XVector_0.20.0         reshape2_1.4.3         LearnBayes_2.15.1     
[55] promises_1.0.1         bindrcpp_0.2.2         sp_1.3-1               seqinr_3.4-5           boot_1.3-20            tools_3.5.1           
[61] glue_1.3.0             purrr_0.2.5            yaml_2.2.0             colorspace_1.3-2       cluster_2.0.7-1        bindr_0.1.1    
zhengxwen commented 6 years ago

We also identified an issue for seqBCF2GDS() using R version 3.5.1.

Do you see the same problem when you use R version 3.4.3 or any version lower than 3.5.0?

AAvalos82 commented 6 years ago

It was not present in R 3.4.3 for sure, did not test in R 3.5.0.

zhengxwen commented 6 years ago

Could you please test this commit https://github.com/zhengxwen/SeqArray/commit/67b4c314deb05b6888416b9a7ae718f870df5cba to see whether it is fixed or not?

library("devtools")
install_github("zhengxwen/SeqArray")
AAvalos82 commented 6 years ago

Just checked it with the recommended commit. That seems to have fixed the issue and accurate number of variants are being written into the data base.