Some of the INFO attributes were flags, not key:value pairs, so breaking them up into a dictionary structure failed (e.g. BOTHSIDES_SUPPORT)
The ~1GB compressed VCF is 50GB?!?? when decompressed, and the original implementation stored the whole file in memory. This reads compressed input and writes compressed output, storing nothing in memory other than the current line
The Stage now expects this file to be compressed output, which is read by bcftools and bgzip compressed
Same always-compressed change is applied to gCNV
Note: the IDs are somehow not always unique? (There are a few instances of Manta calling the same insertion on consecutive lines, but one will have bothsides_support, and one will not. I'm assuming the calls are not combined due to one having substantially more evidence than the other, so specific genotyping of these may be important.
A couple of issues:
BOTHSIDES_SUPPORT
)Note: the IDs are somehow not always unique? (There are a few instances of Manta calling the same insertion on consecutive lines, but one will have bothsides_support, and one will not. I'm assuming the calls are not combined due to one having substantially more evidence than the other, so specific genotyping of these may be important.