Closed mbhall88 closed 3 years ago
If we left-align indels with bcftools norm
and trim alleles that have no call in the samples, we actually increase the proportion of all alleles with the same length to 59%. Doing this also illustrates that 51% of the records in the pandora VCF have no ALT call.
The results from #62 look pretty good, but it would be it does seem like it might wise to try different filters for compare
VCFs compared to map
.
I tried changing the multi-sample FRS filter to 0.75 but it doesn't seem to be as good as 0.9 (as in https://github.com/mbhall88/head_to_head_pipeline/issues/62#issuecomment-783122783)
I will leave the filters the same for both pandoras for now and revisit this if required.
The script for applying filters has been successfully adapted to work on single- or multi-samples VCFs seamlessly.
A stat from the initial run that may be handy in the future is 48% of records in the sparse VCF have all alleles the same length. This is a crude way of estimating how many SNPs there are in the pandora compare VCF - in case we wanted to try SNP distance.