imgag / megSAP

a Medical Genetics Sequence Analysis Pipeline
GNU General Public License v3.0
68 stars 18 forks source link

change blacklisting of variants from gene-based to region-based #50

Closed marc-sturm closed 3 years ago

marc-sturm commented 4 years ago

Try to come up with a region-based blacklisting of variants. Right now it's done per gene, which is wrong because only a part of a gene might be bad.

We should train a machine learning regression method to give an "artefact probability". We should train separate models for SNVs and InDels, maybe also for exome/genome.

Test with thw following variants:

Test the following properties of a variant as features:

Test the following variant locus metrics as features (with different window sizes, also only left/right of variant)

marc-sturm commented 3 years ago

Implemented low confidence regions based in gnomAD AC0/RF and in-house trio/twin regions: e7d48334cdebfd1f21ece17237b52dd8bee64652