Closed chapmanb closed 7 years ago
You picked on it exactly, it's a buffer overflow problem with the VCF parsing. I had a limit of 4096 characters per VCF line, which is too low. I increased that by 16-fold and that indeed solved the problem. It only ever holds one line in memory anyway just to parse out the chromosome and position, so a larger buffer is reasonable.
Thanks for including the minimal working example, it's immensely helpful when debugging. Good to know too that the linked-reads is useful for you. We've used it for extracting read pairs around SV breakpoints as well.
Jeremiah; Perfect, thank you -- that fix works great and all is working now with our real datasets. Thanks again for the quick turnaround.
Jeremiah; We've been using the awesome linked-region functionality in VariantBam to extract regions supporting structural variant breakends. We ran into an issue using this on larger regions with big
ANN
fields (from snpEff):Removing the
ANN
andSIMPLE_ANN
fields from the VCF enable it to work cleanly. I put together a small test set that demonstrates the problem and workaround (a test BAM file is included but it doesn't matter which you use):https://s3.amazonaws.com/chapmanb/testcases/variantbam_ann.tar.gz
Is this due to ANN field size or some element of the value itself. Happy to try to pre-process (in ways other than removing all annotations) if it would help. Thanks much.