vcflib / vcflib

C++ library and cmdline tools for parsing and manipulating VCF files with python and zig bindings
https://github.com/vcflib/vcflib#vcflib
MIT License
613 stars 218 forks source link

very large del,del hangs vcfallelicprimitives #94

Closed travc closed 2 years ago

travc commented 9 years ago

This is an odd use case, so not really a bug per-se. However, it might crop up in other situations given how freebayes really likes to make long complex calls...

I'm running RNAseq data (tophat hacked to change CIGAR N's to D's), which seems to be working mostly ok. But I hit one rare snag:

vcfallelicprimitives is hanging (or at least taking a ridiculously long time) on one particular variant line: I think the relevant part is TYPE=del,del and LEN=19323,19332 Yeah, that is an intron which should be N, but I recoded to D so I could call with freebayes.

A vcf containing just that line is: https://popi.ucdavis.edu/~travc/tmp/hangs_vcfallelicprimitives.vcf

... Additionally ... Can vcffilter handle those multi-value fields properly? The docs on it should probably be updated to clarify.

zeeev commented 8 years ago

@travc I'm starting to attack bugs. Can you provide the command line you used and I will see what I can do.

ekg commented 8 years ago

Any example with a long deletion should trigger the problem. It is using pairwise alignment to derive a reduced representation of the differences and this is quadratic in space.

On Thu, Mar 31, 2016, 23:52 Zev Kronenberg notifications@github.com wrote:

@travc https://github.com/travc I'm starting to attack bugs. Can you provide the command line you used and I will see what we can do to fix your issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/vcflib/vcflib/issues/94#issuecomment-204162822

zeeev commented 8 years ago

@ekg I could skip variants if the INFO line has abs(POS - END) > X? Or would that be a lousy hack?

ekg commented 8 years ago

Maybe we just set a maximum. Beyond 1000 even, I don't know if we should attempt to decompose. Shorter even might make sense. Like, 100.

On Fri, Apr 1, 2016, 00:15 Zev Kronenberg notifications@github.com wrote:

@ekg https://github.com/ekg I could skip variants if the INFO line has abs(POS - END) > X? Or would that be a lousy hack?

— You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub https://github.com/vcflib/vcflib/issues/94#issuecomment-204167675

pjotrp commented 2 years ago

We introduced wavefront as an aligner and this problem should be solved. Also the -L switch behaves properly on master.