Closed james-lawlor closed 1 year ago
bcf_15.vcf.gz bcf_16.vcf.gz example_16.vcf.gz example_17.vcf.gz example_17.SVANNA.vcf.gz example_16.SVANNA.vcf.gz bcf_16.SVANNA.vcf.gz bcf_15.SVANNA.vcf.gz
Here are some example files and svanna output: bcf_15 and bcf_16 - as above
example_16 - sorted and piped to bgzip v 1.16, was read by svanna example_17 - sorted and piped to bgzip v 1.17, was not read by svanna
Hi @james-lawlor thanks for pointing out the issue in such a detailed way!
First of all, I replicated your issue with the :arrow_up: files on my end. Thanks to your description, I checked if there is anything wrong with the way how SvAnna handles gzipped files and there indeed was an issue.
SvAnna uses Apache commons-compress
library to read gzipped VCF files. This is due to some SV callers producing invalid VCF files (at least during the time of main development). I do not fully understand the details of decompression, but it looks like the recent htslib change can be adressed by adjusting an option in commons-compress
. After changing the GzipCompressorInputStream to decompressed concatenated files, the example_17.vcf.gz
works OK. I will include the changes in the next release.
If the issue is time-sensitive, you can either build the app from source on the dependency-update
branch, or I can build you a release ZIP and share here.
Thanks again for reporting the bug and please let me know if there are any other issues.
The issue should be fixed with the latest release v1.0.4
. Please reopen if the latest release does not resolve it on your end.
Thanks a lot and all the best.. :)
Hi, I've noticed a problem where bgzip-compressed input files created with certain versions of htslib/bgzip or bcftools cause SvAnna to read 0 variants (and produce output files with 0 variants).
For example, using the
example.vcf
file included with SvAnna, sorting and compressing with bcftools v1.15 works as expected, but bcftools v1.16 reads 0 variants. I have also encountered this issue with bgzip v1.17 and v1.18. (bgzip v1.16 works as expected but bcftools v1.16 does not).Partially, this seems due to the "text mode for bgzip" change implemented here https://github.com/samtools/htslib/releases/tag/1.17 as disabling the bgzip blocks at line break by adding
--binary
to the bgzip command solves the issue. However, I'm not sure why this also occurs with bcftools compressed output in v1.16.(I was able to replicate this both with different versions of htslib/bcftools installed via conda and that I downloaded and compiled locally.)
Example:
bcftools v1.15.1 compressed output:
bcftools v1.16 compressed output: