Closed SMartinOSU closed 5 years ago
Hi @SMartinOSU,
Thank you for the detailed report! I'm interpreting this as a "rare" issue as opposed to a "random" issue. If it were random you might error on a specific file but when you go back to that file you actually can open it. With a rare issue you might error while opening one of your 50,000 files but when you trace down that file you consistently and reproducibly get your error.
Here's what I see on my machine.
library(vcfR)
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.8.0
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
# I'm using reprex::reprex() here.
# It works from a tmp directory so I have to tell it where the file is.
test <- read.vcfR("~/Desktop/sandbox/SMartinOSU/slim_full_30054.vcf")
#> Scanning file to determine attributes.
#> Error in read.vcfR("~/Desktop/sandbox/SMartinOSU/slim_full_30054.vcf"): Your file appears to have 89 header elements and 9724 columns in the body.
#> This should never happen!
# test
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/libblas/libblas.so.3.6.0
#> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] vcfR_1.8.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 knitr_1.21 cluster_2.0.7-1
#> [4] magrittr_1.5 MASS_7.3-51.1 viridisLite_0.3.0
#> [7] ape_5.2 lattice_0.20-38 stringr_1.4.0
#> [10] highr_0.7 tools_3.5.2 parallel_3.5.2
#> [13] grid_3.5.2 nlme_3.1-137 mgcv_1.8-26
#> [16] xfun_0.4 vegan_2.5-3 htmltools_0.3.6
#> [19] yaml_2.2.0 digest_0.6.18 permute_0.9-4
#> [22] Matrix_1.2-15 evaluate_0.12 rmarkdown_1.11
#> [25] pinfsc50_1.1.0 stringi_1.4.3 compiler_3.5.2
Created on 2019-05-31 by the reprex package (v0.2.1)
This is not a segfault as reported. But it is an error. Is this different from what @SMartinOSU sees on their machine?
I've rerun the example with a different version of vcfR and it appears the problem has been solved.
library(vcfR)
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.8.0.9000
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
# I'm using reprex::reprex() here.
# It works from a tmp directory so I have to tell it where the file is.
test <- read.vcfR("~/Desktop/sandbox/SMartinOSU/slim_full_30054.vcf")
#> Scanning file to determine attributes.
#> File attributes:
#> meta lines: 12
#> header_line: 13
#> variant count: 618
#> column count: 89
#>
Meta line 12 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#> Character matrix gt rows: 618
#> Character matrix gt cols: 89
#> skip: 0
#> nrows: 618
#> row_num: 0
#>
Processed variant: 618
#> All variants processed
test
#> ***** Object of Class vcfR *****
#> 80 samples
#> 1 CHROMs
#> 618 variants
#> Object size: 0.5 Mb
#> 0 percent missing data
#> ***** ***** *****
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/libblas/libblas.so.3.6.0
#> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] vcfR_1.8.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 knitr_1.21 cluster_2.0.7-1
#> [4] magrittr_1.5 MASS_7.3-51.1 viridisLite_0.3.0
#> [7] ape_5.2 lattice_0.20-38 stringr_1.4.0
#> [10] highr_0.7 tools_3.5.2 parallel_3.5.2
#> [13] grid_3.5.2 nlme_3.1-137 mgcv_1.8-26
#> [16] xfun_0.4 vegan_2.5-3 htmltools_0.3.6
#> [19] yaml_2.2.0 digest_0.6.18 permute_0.9-4
#> [22] Matrix_1.2-15 evaluate_0.12 rmarkdown_1.11
#> [25] pinfsc50_1.1.0 stringi_1.4.3 compiler_3.5.2
#> [28] memuse_4.0-0
Created on 2019-05-31 by the reprex package (v0.2.1)
vcfR_1.8.0 is the current version on CRAN and what is reported as used by the post. vcfR_1.8.0.9000 is the version on GitHub and is more current. Could you try to install this version and see if it resolves your issue? You can install the GitHub version as follows (your system will need a compiler).
devtools::install_github(repo="knausb/vcfR")
Thanks! Brian
I installed version 1.8.0.9 from github, and that seems to have solved the problem. I queued up my bash scripts to run through all the other vcf files, and should know by morning if they all were read successfully. Once I confirm those files go through, I'll plan on closing this issue. Thanks for the quick reply and help,
-Scott
Edit: Looks like everything is working great now, thanks for all the help!
Hello, I'm trying to use vcfR to read in a large number of vcf files generated by SLiM 3 as part of an ABC model. However, I'm randomly getting some .vcf files that crash vcfR, and when I look at them I don't see any reason for these files to be causing a problem. It's also a pretty rare occurrence (~4-5 times in 50,000 files). I tracked down one of the problematic vcf files, and I was able to reproduce it crashing in RStudio with: test<-vcfR:read.vcfR("./slim_full_30054.vcf")
This is the start of the problem vcf, sessionInfo for the windows crash, and I attached the whole vcf file as a text file:
slim_full_30054.txt I'm not sure if this is an issue with vcfR or with how SLiM writes out its files, and would greatly appreciate any insight. Thank you for your time, and vcfR has been a really useful package for a lot of my work! -Scott