Closed jjfarrell closed 1 year ago
I tried a different algorithm (-nelder-mead) and instead got a segmentation fault instead of bus error. The variant triggering the error is a rare DUP with mostly het calls but with a couple homozygous samples.
var/spool/sge/scc-ym2/job_scripts/945469: line 11: 279846 Segmentation fault ruth --nelder-mead --vcf $VCF --evec adsp5k.evec --field $FIELD --out $VCF_RUTH
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
It looks that your input VCF file seems truncated
Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu
On Tue, Nov 19, 2019 at 8:03 AM jjfarrell notifications@github.com wrote:
I tried a different algorithm (-nelder-mead) and instead got a segmentation fault instead of bus error. The variant triggering the error is a rare DUP with mostly het calls but with a couple homozygous samples.
var/spool/sge/scc-ym2/job_scripts/945469: line 11: 279846 Segmentation fault ruth --nelder-mead --vcf $VCF --evec adsp5k.evec --field $FIELD --out $VCF_RUTH [W::bgzf_read_block] EOF marker is absent. The input is probably truncated
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/ruth/issues/3?email_source=notifications&email_token=ABPY5OMSNJMSZZA6NRYZC6DQUPP2RA5CNFSM4JPCWGYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEODUWQ#issuecomment-555498074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPY5OKH3NGYX3NDRGC6NZDQUPP2RANCNFSM4JPCWGYA .
The error suggests that but the file is not truncated. It is indexed with tabix with no errors. zcat vcf.gz|wc runs without an error. If I extract that region into a vcf.gz with tabix with no error , the ruth error still occurs on the subset.
Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu
On Tue, Nov 19, 2019 at 9:08 AM jjfarrell notifications@github.com wrote:
The error suggests that but the file is not truncated. It is indexed with tabix with no errors. zcat vcf.gz|wc runs without an error. If I extract that region into a vcf.gz with tabix with no error , the ruth error still occurs on the subset.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/ruth/issues/3?email_source=notifications&email_token=ABPY5ONXLNJ4EMOIYVDNRRDQUPXNFA5CNFSM4JPCWGYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOJYMI#issuecomment-555523121, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPY5OLSAW5MVHG2PQB7HJDQUPXNFANCNFSM4JPCWGYA .
Yes, with no truncation error...
bcftools view chr2_test.vcf.gz|wc 3536 521960 34900080 bcftools view adsp5k.lumpy.duphold.chr2.vcf.gz|wc 340923 1619304786 105957815767
Hmm.. then there might be something strange happening because the error is happening in htslib (in bgzf) not in cramore. Are you using the latest version of htslib?
Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu
On Tue, Nov 19, 2019 at 12:16 PM jjfarrell notifications@github.com wrote:
Yes, with no truncation error...
bcftools view chr2_test.vcf.gz|wc 3536 521960 34900080 bcftools view adsp5k.lumpy.duphold.chr2.vcf.gz|wc 340923 1619304786 105957815767
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/ruth/issues/3?email_source=notifications&email_token=ABPY5ONN3JCNYRBIGKUVA43QUQNPVA5CNFSM4JPCWGYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEO7RQQ#issuecomment-555612354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPY5OPIACCJKL3CEFUEFBLQUQNPVANCNFSM4JPCWGYA .
Either htslib 1.8 or 1.9 On the test vcf, it runs if the field specified is GL instead of GT.
uth.sh chr2_test.vcf.gz GL
Run Ruth on passed variants
Available Options
The following parameters are available. Ones with "[]" are in effect:
Input Options : --evec [adsp5k.evec],
--vcf [chr2_test.vcf.gz],
--thin [1.00], --seed,
--num-pc [4], --field [GL],
--gt-error [5.0e-03],
--lambda [1.00]
Output Options : --out [chr2_test.ruth.vcf.gz],
--skip-if, --skip-info,
--site-only, --nelder-mead,
--lrt-test, --lrt-em
Samples to focus on : --sm-list
Parameters for sex chromosomes : --sex-map, --x-label [X],
--y-label [Y], --mt-label [MT],
--x-start [2699520],
--x-stop [154931044]
Options to specify when chunking is used : --ref, --unit [2147483647],
--interval, --region
Run with --help for more detailed help messages of each argument.
NOTICE [2019/11/20 15:08:32] - Analysis Started
NOTICE [2019/11/20 15:08:32] - Reading sample eigenvectors
NOTICE [2019/11/20 15:08:32] - Identifying sample columns to extract..
NOTICE [2019/11/20 15:08:32] - Reading in BCFs...
NOTICE [2019/11/20 15:08:32] - Finished identifying 4789 samples to load from VCF/BCF
NOTICE [2019/11/20 15:08:37] - Reading 100 variants at chr2:1527586, Skipping 0, Missing 0.
NOTICE [2019/11/20 15:08:38] - Analysis Finished
[farrell@scc-hadoop duphold]$
[farrell@scc-hadoop duphold]$ ./ruth.sh chr2_test.vcf.gz GT
Run Ruth on passed variants
Available Options
The following parameters are available. Ones with "[]" are in effect:
Input Options : --evec [adsp5k.evec],
--vcf [chr2_test.vcf.gz],
--thin [1.00], --seed,
--num-pc [4], --field [GT],
--gt-error [5.0e-03],
--lambda [1.00]
Output Options : --out [chr2_test.ruth.vcf.gz],
--skip-if, --skip-info,
--site-only, --nelder-mead,
--lrt-test, --lrt-em
Samples to focus on : --sm-list
Parameters for sex chromosomes : --sex-map, --x-label [X],
--y-label [Y], --mt-label [MT],
--x-start [2699520],
--x-stop [154931044]
Options to specify when chunking is used : --ref, --unit [2147483647],
--interval, --region
Run with --help for more detailed help messages of each argument.
NOTICE [2019/11/20 15:08:49] - Analysis Started
NOTICE [2019/11/20 15:08:49] - Reading sample eigenvectors
NOTICE [2019/11/20 15:08:49] - Identifying sample columns to extract..
NOTICE [2019/11/20 15:08:49] - Reading in BCFs...
NOTICE [2019/11/20 15:08:49] - Finished identifying 4789 samples to load from VCF/BCF
NOTICE [2019/11/20 15:08:54] - Reading 100 variants at chr2:1527586, Skipping 0, Missing 0.
./ruth.sh: line 11: 34803 Segmentation fault ruth --vcf $VCF --evec adsp5k.evec --field $FIELD --out $VCF_RUTH
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Not seeing this error with the latest version on recently created vcf files.
The following error is occurring when running RUTH on a couple of chromosomes from Lumpy/SVTyper vcf of 4789 samples. A Bus error occurs and then an error about the truncated file. The file is not truncated. I extracted that region out of the VCF and the error still occurs. RUTH has run fine on 4 other sets of SV calls. It has also run fine on 20 other chromosomes from LUMPY. There is also a similar error on chr18. Seems to be catching some edge case. Any suggestions?