Closed cheesemania closed 8 years ago
Note that in your file:
file format: unknown
It seems that the header of your VCF file does not contain annotation information. At least SeqArray works with >=VCFv4.0
The possible solution is that you edit the VCF file and add VCF header with standard format defined in VCFv4.0.
I tried a new file (v4.2 vcf) and have the same problem.
library(SeqArray) Loading required package: gdsfmt seqVCF2GDS("Snps.hapcall_recal_SNPs.vcf.gz","out.gds") Thu Oct 27 09:21:06 2016 Variant Call Format (VCF) Import: file(s): Snps.hapcall_recal_SNPs.vcf.gz (3.5M) file format: VCFv4.2 the number of sets of chromosomes (ploidy): 2 the number of samples: 222 genotype storage: bit2 compression method: ZIP_RA Output: out.gds Error in (function (node, name, val = NULL, storage = storage.mode(val), : Stream read error
* caught segfault * address 0x80, cause 'memory not mapped'
Traceback: 1: closefn.gds(gfile) 2: seqVCF2GDS("Snps.hapcall_recal_SNPs.vcf.gz", "out.gds")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
Show me sessionInfo()
please.
library(SeqArray) Loading required package: gdsfmt sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] SeqArray_1.11.18 gdsfmt_1.7.17
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.36.0 XVector_0.14.0
[3] GenomicAlignments_1.10.0 GenomicRanges_1.26.1
[5] BiocGenerics_0.20.0 zlibbioc_1.20.0
[7] IRanges_2.8.0 BiocParallel_1.8.0
[9] BSgenome_1.42.0 lattice_0.20-33
[11] GenomeInfoDb_1.10.0 tools_3.3.0
[13] SummarizedExperiment_1.4.0 parallel_3.3.0
[15] grid_3.3.0 Biobase_2.34.0
[17] DBI_0.5-1 Matrix_1.2-6
[19] rtracklayer_1.34.0 S4Vectors_0.12.0
[21] bitops_1.0-6 RCurl_1.95-4.8
[23] biomaRt_2.30.0 RSQLite_1.0.0
[25] GenomicFeatures_1.26.0 Biostrings_2.42.0
[27] Rsamtools_1.26.1 stats4_3.3.0
[29] XML_3.98-1.4 VariantAnnotation_1.20.0
vcf.fn <- seqExampleFileName("vcf")
conversion
seqVCF2GDS(vcf.fn, "tmp.gds") Fri Oct 28 09:23:38 2016 The Variant Call Format (VCF) header: file format: VCFv4.0 the number of sets of chromosomes (ploidy): 2 the number of samples: 90 GDS genotype storage: bit2 Error in (function (node, name, val = NULL, storage = storage.mode(val), : Stream read error
* caught segfault * address 0x80, cause 'memory not mapped'
Traceback: 1: closefn.gds(gfile) 2: seqVCF2GDS(vcf.fn, "tmp.gds")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
See the session info:
R version 3.3.0 (2016-05-03)
other attached packages:
[1] SeqArray_1.11.18 gdsfmt_1.7.17
If you have difficulty instalingl the latest version of gdsfmt and SeqArray in R_3.3.0 via biocLite, please install the packages via GitHub:
library("devtools")
install_github("zhengxwen/gdsfmt")
install_github("zhengxwen/SeqArray")
Or you might send me your VCF file to zhengxwen@gmail.com
I'm happy to send a vcf if needed. However, I updated the packages and ran the test scripts with the same issues arising. Any ideas what may be up here?
sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] SeqArray_1.13.6 gdsfmt_1.8.3 devtools_1.12.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.36.0 XVector_0.14.0
[3] GenomicAlignments_1.10.0 GenomicRanges_1.26.1
[5] BiocGenerics_0.20.0 zlibbioc_1.20.0
[7] IRanges_2.8.0 BiocParallel_1.8.0
[9] BSgenome_1.42.0 lattice_0.20-33
[11] R6_2.2.0 httr_1.2.1
[13] GenomeInfoDb_1.10.0 tools_3.3.0
[15] SummarizedExperiment_1.4.0 parallel_3.3.0
[17] grid_3.3.0 Biobase_2.34.0
[19] DBI_0.5-1 git2r_0.15.0
[21] withr_1.0.2 digest_0.6.10
[23] Matrix_1.2-6 rtracklayer_1.34.0
[25] S4Vectors_0.12.0 bitops_1.0-6
[27] biomaRt_2.30.0 RCurl_1.95-4.8
[29] curl_2.2 RSQLite_1.0.0
[31] memoise_1.0.0 BiocInstaller_1.24.0
[33] GenomicFeatures_1.26.0 Biostrings_2.42.0
[35] Rsamtools_1.26.1 XML_3.98-1.4
[37] stats4_3.3.0 VariantAnnotation_1.20.0
vcf.fn <- seqExampleFileName("vcf") seqVCF2GDS(vcf.fn, "tmp.gds") Wed Nov 2 09:34:05 2016 Variant Call Format (VCF) Import: file(s): CEU_Exon.vcf.gz (226.0K) file format: VCFv4.0 the number of sets of chromosomes (ploidy): 2 the number of samples: 90 genotype storage: bit2 compression method: ZIP_RA Output: tmp.gds Error in (function (node, name, val = NULL, storage = storage.mode(val), : Stream read error
* caught segfault * address 0x80, cause 'memory not mapped'
Traceback: 1: closefn.gds(gfile) 2: seqVCF2GDS(vcf.fn, "tmp.gds")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: 3
I cannot reproduce the error using virtual machine + Ubuntu 16.04.1 LTS and R 3.3.0.
Please show me which C/C++ compiler you are using, gcc/g++?
g++ -v
Are you able to run R CMD check gdsfmt_1.10.0.tar.gz
?
gdsfmt_1.10.0.tar.gz
is downloaded at:
http://www.bioconductor.org/packages/release/bioc/src/contrib/gdsfmt_1.10.0.tar.gz
Success!!!!!
I ran the check command, installed RUnit and knitr and we are up and running!
Thanks so much for persevering with this problem, its very much appreciated
R CMD check gdsfmt_1.11.0.tar.gz
VignetteBuilder package required for checking but not installed: ‘knitr’
The suggested packages are required for a complete check. Checking can be attempted without them by setting the environment variable _R_CHECK_FORCESUGGESTS to a false value.
See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’ manual.
Status: 1 ERROR See ‘/home/ian/Downloads/gdsfmt.Rcheck/00check.log’ for details.
I'm running R3.3.1 on Ubuntu, an previously installed SeqArray on a mac (R3.3.1 again) with no issues
I'm getting a Seqfault with the seqVCF2GDS function. I've reinstalled R, plus all dependencies and have no alterations in the outcome (and used both versions 1.14 and 1.15 of SeqArray with 1.10 and 1.11 of gdsfmt). Output pasted below, any ideas what may be going wrong here?
seqVCF2GDS("Snps.hapcall_recal_SNPs_core_sel.vcf.gz","out.gds") Wed Oct 26 17:08:36 2016 Variant Call Format (VCF) Import: file(s): Snps.hapcall_recal_SNPs_core_sel.vcf.gz (2.1M) file format: unknown the number of sets of chromosomes (ploidy): 2 the number of samples: 138 genotype storage: bit2 compression method: ZIP_RA variable id in the FORMAT field should be defined ahead, and the undefined id is/are ignored during the conversion. Output: out.gds Error in (function (node, name, val = NULL, storage = storage.mode(val), : Stream read error
* caught segfault * address 0x80, cause 'memory not mapped'