feat: add binned vaf column for sorting by allele frequency

FelixMoelder commented 1 year ago

As workflows like dna-seq-mtb come with several callsets each being split by low and high allele frequency the final datavzrd report becomes cluttered. Instead of creating two separate callssets we could create a single one by adding additional binned allele frequency columns(binned into low, medium and high). This allows to sort variants by their AF showing variants with a high frequency on top of the report.

While this PR is just a preparation and the sorting needs to be defined in the callset configuration I would like to discuss if this implementation can be improved. Currently we have a distinct allele frequency for each sample (e.g. tumor and normal) resulting in a corresponding binned AF column. As we have use predefined callsets in the dna-seq-mtb workflow we also need to set the column name for sort in the default-config file which might be something like tumor: binned vaf. In this case we would assume that always a sample called tumor exist which might not be the case.

johanneskoester commented 1 year ago

Let us have just one column (called binned_max_vaf), which first takes the max VAF among all samples in a group and then bins it.

FelixMoelder commented 1 year ago

This should be good to go now.

snakemake-workflows / dna-seq-varlociraptor

feat: add binned vaf column for sorting by allele frequency #240