vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Signal 11 crash - "No such file or directory" #3730

Open asherrar opened 1 year ago

asherrar commented 1 year ago

1. What were you trying to do?

I was attempting to use vg autoindex to create a graph using T2T-CHM13 as a reference and a VCF file built from 1KGP variants mapped to said reference.

2. What did you want to happen?

To have a valid .vg file and associated indices for further processing.

3. What actually happened?

[vg autoindex] Executing command: /vg/bin/vg autoindex -w map -r /home/asherrar/t2t_sequence/v2.0/chm13v2.0.fa -v /scratch/asherrar/1kgp_vcf/1kgp.chrX.recalibrated.snp_indel.pass.chm13.v1.1.liftover.vcf -p t2t_1kgp_chrX
[IndexRegistry]: Checking for phasing in VCF(s).
[E::hts_open_format] Failed to open file "/scratch/asherrar/1kgp_vcf/1kgp.chrX.recalibrated.snp_indel.pass.chm13.v1.1.liftover.vcf" : No such file or directory
ERROR: Signal 11 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_7WfqGN/stacktrace.txt
Please include the stack trace file in your bug report!

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.42.0 "Obolo"
Stack trace (most recent call last):
#6    Object "/vg/bin/vg", at 0x5e1fad, in _start
#5    Object "/vg/bin/vg", at 0x1e68f0f, in __libc_start_main
#4    Object "/vg/bin/vg", at 0x5b2fee, in main
#3    Object "/vg/bin/vg", at 0xcfb5bb, in vg::subcommand::Subcommand::operator()(int, char**) const
#2    Object "/vg/bin/vg", at 0xd24279, in main_autoindex(int, char**)
#1    Object "/vg/bin/vg", at 0x1206f71, in vg::IndexRegistry::vcf_is_phased(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#0    Object "/vg/bin/vg", at 0x1bac235, in bcf_hdr_read

5. What data and command can the vg dev team use to make the problem happen?

Reference: https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.4 VCF: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/variants/1000_Genomes_Project/chm13v1.1/1kgp.chrX.recalibrated.snp_indel.pass.chm13.v1.1.liftover.vcf.gz (unzipped via gunzip)

vg autoindex -w map -r /home/asherrar/t2t_sequence/v2.0/chm13v2.0.fa -v /scratch/asherrar/1kgp_vcf/1kgp.chrX.recalibrated.snp_indel.pass.chm13.v1.1.liftover.vcf -p t2t_1kgp_chrX

6. What does running vg version say?

vg version v1.42.0 "Obolo"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by root@buildkitsandbox

(installed via singularity-hpc and run as an Lmod module, if it helps at all - strangely I can't seem to grab v1.43.0 this way right now, may try removing completely and attempting to reinstall)

jeizenga commented 1 year ago

This is probably the result of a typo in the file name. Can you try the same commend, except copy-and-pasting the file name from ls?

asherrar commented 1 year ago

Just re-ran it as you suggested - exact same error. Also confirmed the file name is correct, copied it from the error message and ran head on it, got part of the VCF header as expected. Is it possible that vg autoindex doesn't like VCFs with multiple alternate sequences per site? I swore I saw mentions of people using the 1KGP dataset with vg before, which is what I'm attempting to tinker with here.

jeizenga commented 1 year ago

No, I'm pretty sure that multiple alts shouldn't be an issue. This error is originating from htslib, which makes me strongly suspect an issue with file validity or file I/O. I'll give it a try on my end to see if I can replicate it.

asherrar commented 1 year ago

Alright, much appreciated! If you can think of anything else to test from my end, let me know. I'm currently splitting the multiple alts with bcftools anyway for the sake of filtering, so as soon as that's done I'm going to test that in the same command to rule that out as the cause - will post an update when it's done.

jeizenga commented 1 year ago

In my hands, the command runs without any problems. I was able to reproduce the error output by adding a typo into the file name, but other than that I don't have any particular insight on this issue. Are all of the absolute file paths correct?

asherrar commented 1 year ago

Yup, just double-checked both of 'em, copying directly from the command and sending 'em through head:

$ head /home/asherrar/t2t_sequence/v2.0/chm13v2.0.fa
>chr1 CP068277.2 Homo sapiens isolate CHM13 chromosome 1
Caccctaaaccctaacccctaaccctaaccctaaccctaaccctaaccctaacccctaaaccctaaccctaaccctaacc
ctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccct
aaccctaaccctaacccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta
accctaaccctaaccctaaccctaaccctaaccctaaccctaacccaaccctaaccctaaccctaaccctaaccctaacc
ctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccct
aaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaa
ccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacc
ctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccct
aaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaa

$ head /scratch/asherrar/1kgp_vcf/1kgp.chrX.recalibrated.snp_indel.pass.chm13.v1.1.liftover.vcf
##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=VQSRTrancheINDEL99.00to100.00+,Description="Truth sensitivity tranche level for INDEL model at VQS Lod < -22120.414">
##FILTER=<ID=VQSRTrancheINDEL99.00to100.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -22120.414 <= x < -0.4496">
##FILTER=<ID=VQSRTrancheSNP99.80to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -39762.1377">
##FILTER=<ID=VQSRTrancheSNP99.80to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -39762.1377 <= x < -65.1324">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

The files exist and are where they're expected to be, as far as I can tell. The only thing I can think of is something about the environment itself is messing with things, or it being something that was fixed in v1.43.0 - is that what you used?

jeizenga commented 1 year ago

I tried again with 1.42.0 and had the same result. Could it be a file permissions issue?

asherrar commented 1 year ago

I don't think so, gave the files full permissions (chmod 777) to test and I get the exact same error. I'm chalking it up to an environment quirk at this point... will ask the folks who run the cluster to take a peek.