hepcat72 / vcfSampleCompare

Filter and rank variant call files (VCF) based on comparative evidence ratios between groups of samples.
GNU General Public License v3.0
2 stars 1 forks source link

Add algorithm description for sorting & filtering #7

Closed hepcat72 closed 5 years ago

hepcat72 commented 5 years ago

Add a description to the README and the help output describing the logic for the scoring and the sorting:

  1. Default sorts via best_sep_score and sub-sorts via descending average depth
  2. Best_sep_score is based on degree of difference in:
    • If genotype is used: genotype call (trinary: different, the same, one or more no-calls)
    • If genotype is not used: separation gap value (AO/DP - AO/DP) (where "AO" can be "RO")
  3. The best separation gap is the AO/DP difference between the worst-included samples in the respective sets. This logic is applied to all states (e.g. RO(0) and AO(1,2,3,...)) and the state that produces the best score is the one that's used.

Take a look at the code to confirm what I recall above and see if I can word it better.