hall-lab / svtools

Tools for processing and analyzing structural variants.
MIT License
150 stars 54 forks source link

bedpetovcf does not work with vcftobedpe output as input #324

Open anoronh4 opened 2 years ago

anoronh4 commented 2 years ago

I'm trying the dockerhub image with svtools and running as follows:

singularity exec -e --no-home docker://halllab/svtools:v0.5.1 svtools vcftobedpe -i $inputvcf -o vcftobedpe.bedpe -t tmp
singularity exec -e --no-home docker://halllab/svtools:v0.5.1 svtools bedpetovcf -i vcftobedpe.bedpe -o bedpetovcf.vcf -t tmp

The first operation works fine and has no error. The bedpe looks as expected. The second step has the following error:

Traceback (most recent call last):
  File "/opt/hall-lab/python-2.7.15/bin/svtools", line 11, in <module>
    sys.exit(main())
  File "/opt/hall-lab/python-2.7.15/lib/python2.7/site-packages/svtools/cli.py", line 79, in main
    sys.exit(args.entry_point(args))
  File "/opt/hall-lab/python-2.7.15/lib/python2.7/site-packages/svtools/bedpetovcf.py", line 73, in run_from_args
    bedpeToVcf(stream, args.output)
  File "/opt/hall-lab/python-2.7.15/lib/python2.7/site-packages/svtools/bedpetovcf.py", line 36, in bedpeToVcf
    myvcf.add_header(header)
  File "/opt/hall-lab/python-2.7.15/lib/python2.7/site-packages/svtools/vcf/file.py", line 54, in add_header
    self.add_filter(*[b.split('=')[1] for b in self.parse_meta(line)])
IndexError: list index out of range

Just wondering what might be the issue here.

anoronh4 commented 2 years ago

i found at least one of the offending lines:

##FILTER=<ID=svaba_LOWAS,Description="Alignment score of one end is less than 80% of contig length, or number of mismatch bases (NM) on one end is >= 10">

vcftobedpe turns this line into:

##FILTER=<ID=svaba_LOWAS,Description=""Alignment score of one end is less than 80% of contig length, or number of mismatch bases (NM) on one end is >">

and then this line becomes unparseable with with bedpetovcf. i found a few similar looking lines:

grep "\"\"" vcftobedpe.bedpe 
##FILTER=<ID=svaba_TOOSHORT,Description=""Contig alignment for part of this rearrangement has <">
##FILTER=<ID=svaba_LOWAS,Description=""Alignment score of one end is less than 80% of contig length, or number of mismatch bases (NM) on one end is >">
##FILTER=<ID=svaba_WEAKSUPPORTHIREP,Description=""Fewer then 7 split reads for variant with >">
##FORMAT=<ID=svaba_LR,Number=1,Type=Float,Description=""Log-odds that this variant is REF vs AF">

seems like strings are not well handled when special characters such as <>= are in the Description.