Open alexisregelson opened 1 year ago
See:
the total number of variants for import: 3,632
This number is too small, parallel=6L
does not help at all.
I guess parallel=6L
might trigger a bug when merging the data files when the number of variants is too small.
seqVCF2GDS(high_mod_vcf, "r4_chr1_high_mod.gds", parallel=1)
It might solve your problem.
Hello,
I've now tried this with a vcf with a 200k+ varaints. I have successfully converted this vcf to a gds using SNPRelate. However, I am using another software that specifically needs the gds file in SeqArray format, not SNPRelate. But I am still getting the same error: sample.idError: segfault from C stack overflow.
Alexis
Your R version and gdsfmt versions are old. The recent update was made with a focus on R (>= v4.0). I suggest using SeqArray GDS format instead of SNPRelate GDS.
Hello, I am trying to use seqVCF2GDS and am getting the following error:
library(SeqArray) library(data.table)
seqVCF2GDS(high_mod_vcf, "r4_chr1_high_mod.gds", parallel=6L) Mon Nov 6 16:09:06 2023 Variant Call Format (VCF) Import: file(s): r4_PASS_chr1_updated_varID_dups_drop_updated_IDs_nhw_hwe6_noNHWrelateds_high_mod_impact.vcf (198.8M) file format: VCFv4.2 the number of sets of chromosomes (ploidy): 2 the number of samples: 14,306 genotype storage: bit2 compression method: LZMA_RA
of samples: 14306
Output: r4_chr1_high_mod.gds Merging: opening 'r4_chr1_high_mod_tmp01_ad336f56fc72' ... [done] opening 'r4_chr1_high_mod_tmp02_ad3315e862b7' ... [done] opening 'r4_chr1_high_mod_tmp03_ad33613818b1' ... [done] opening 'r4_chr1_high_mod_tmp04_ad33473817c6' ... [done] opening 'r4_chr1_high_mod_tmp05_ad334e0fea8c' ... [done] opening 'r4_chr1_high_mod_tmp06_ad33607634f8' ... [done] Digests: sample.idError: segfault from C stack overflow
Do the sampel IDs need to be in a particular format? I created my vcf with plink and used double-id option. IDs are in format: A-[Cohort]-[A#####]. A .gds file is outputed, but I don't know if it's is incorrect due to the segfault.
gds <- seqOpen(r4_chr1_high_mod.gds) gds Object of class "SeqVarGDSClass" File: r4_chr1_high_mod.gds (294.4K)
sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS: /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/lib/libR.so LAPACK: /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/modules/lapack.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.14.8 SeqArray_1.26.2 gdsfmt_1.22.0
loaded via a namespace (and not attached): [1] zlibbioc_1.32.0 compiler_3.6.0 IRanges_2.20.2
[4] XVector_0.26.0 parallel_3.6.0 GenomicRanges_1.38.0
[7] GenomeInfoDbData_1.2.2 RCurl_1.95-4.12 Biostrings_2.54.0
[10] S4Vectors_0.24.4 BiocGenerics_0.32.0 GenomeInfoDb_1.22.1
[13] bitops_1.0-6 stats4_3.6.0
Thank you, Alexis