A Snakemake workflow for calling small and structural variants under any kind of scenario (tumor/normal, tumor/normal/relapse, germline, pedigree, populations) via the unified statistical model of Varlociraptor.
MIT License
82
stars
38
forks
source link
feat: add binned vaf column for sorting by allele frequency #240
As workflows like dna-seq-mtb come with several callsets each being split by low and high allele frequency the final datavzrd report becomes cluttered.
Instead of creating two separate callssets we could create a single one by adding additional binned allele frequency columns(binned into low, medium and high). This allows to sort variants by their AF showing variants with a high frequency on top of the report.
While this PR is just a preparation and the sorting needs to be defined in the callset configuration I would like to discuss if this implementation can be improved. Currently we have a distinct allele frequency for each sample (e.g. tumor and normal) resulting in a corresponding binned AF column.
As we have use predefined callsets in the dna-seq-mtb workflow we also need to set the column name for sort in the default-config file which might be something like tumor: binned vaf. In this case we would assume that always a sample called tumor exist which might not be the case.
As workflows like dna-seq-mtb come with several callsets each being split by low and high allele frequency the final datavzrd report becomes cluttered. Instead of creating two separate callssets we could create a single one by adding additional binned allele frequency columns(binned into
low
,medium
andhigh
). This allows to sort variants by their AF showing variants with a high frequency on top of the report.While this PR is just a preparation and the sorting needs to be defined in the callset configuration I would like to discuss if this implementation can be improved. Currently we have a distinct allele frequency for each sample (e.g. tumor and normal) resulting in a corresponding binned AF column. As we have use predefined callsets in the
dna-seq-mtb
workflow we also need to set the column name for sort in the default-config file which might be something liketumor: binned vaf
. In this case we would assume that always a sample calledtumor
exist which might not be the case.