vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

what(): obsolete, invalid, or corrupt protobuf input #4251

Closed RohitKapila closed 3 months ago

RohitKapila commented 3 months ago

1. What were you trying to do?

Trying to call SVs

2. What did you want to happen?

get a .vcf with SVs

3. What actually happened?

filed to make a .pack file. 
  what():  obsolete, invalid, or corrupt protobuf input
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_RUhGJb/stacktrace.txt

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

 /tmp/vg_crash_RUhGJb/stacktrace.txt

5. What data and command can the vg dev team use to make the problem happen?

vg construct -f -S -I insertions.fa -a -r N2_refrence.fna -v N2_2Formal_Manta.vcf > 6graph.vg vg convert 6graph.vg -a > 6graph.gbz vg index 6graph.vg -L -x 6graph.xg vg index -x 6graph.xg -g 6graph.gcsa -k 16 6graph.vg vg mpmap -x 6graph.xg -g 6graph.gcsa -f N2_2Formal_Combined.fq > 6graph.gam vg snarls 6graph.gbz > 6graph.snarls vg pack -x 6graph.gbz -g 6graph.gam -o 6graph.pack -Q 5


It terminated with the following error:
''
terminate called after throwing an instance of 'std::runtime_error'
  what():  obsolete, invalid, or corrupt protobuf input
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_4OpdMA/stacktrace.txt

script used to create 'insertion.fa' file in construct command is following:
'''
def create_insertions_fa(vcf_file, output_file):
    with open(vcf_file, 'r') as vcf, open(output_file, 'w') as fa:
        for line in vcf:
            if line.startswith('#'):
                continue  # Skip header lines
            parts = line.strip().split('\t')
            #ralt = parts[4].rstrip('[') 
            #lalt = parts[4].lstrip(']')
            #if 'NC' in ralt
            ref, alt = parts[3], parts[4]
            if len(alt) > len(ref):  # Check if it's an insertion
                insertion_id = f"{parts[0]}_{parts[1]}"  # Using chromosome_position as ID
                sequence = alt
                fa.write(f">{insertion_id}\n{sequence}\n")

# Define the path for the input VCF file
vcf_file_path = '/home/rkapila/Short_read/Vgtoolkit/N2/Formal/N2_2Formal/Giraffee/N2_2Formal_Manta.vcf'

# Define the path for the output FASTA file in the same directory
output_file_path = '/home/rkapila/Short_read/Vgtoolkit/N2/Formal/N2_2Formal/Giraffee/insertions.fa'

# Call the function with the specified file paths
create_insertions_fa(vcf_file_path, output_file_path)
**6. What does running `vg version` say?**

vg version v1.44.0 "Solara" Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux Linked against libstd++ 20210601 Built by anovak@octagon

glennhickey commented 3 months ago

vg can be quite complicated to use, but there is a fair bit of documentation and examples on the wiki, along with some basic information in the README. Please take a look there, to see that you need to index before mapping (previous issue) or that vg mpmap does not output GAM by default (this issue). Also, it wouldn't hurt to update to a current vg release either.