sigven / cpsr

Cancer Predisposition Sequencing Reporter (CPSR)
https://sigven.github.io/cpsr/
Other
56 stars 12 forks source link

CPSR gnomAD maf cut-off #16

Closed vladsavelyev closed 5 years ago

vladsavelyev commented 5 years ago

Hey Sigve,

Just wondering if there is any reason to have a low population frequency cut-off for germline variants? My first guess would be that predisposition variants more common than 5%, but I admit I don't know really.

Vlad

sigven commented 5 years ago

Hey Vlad, The gnomAD threshold gives you the opportunity to essentially rule out (from the report) any variant that is relatively common in the population. Most pathogenic variants in cancer susceptibility genes (here I am thinking of the ones with high penetrance, i.e. with an P/LP status in ClinVar) have very low allele frequency, or are even absent from gnomAD. This is supported by standard pathogenicity scoring criteria (ACMG), which gives you a score towards "pathogenic" if the allele frequency is very low, whereas it gives you a score towards "benign" if the allele frequency is high (likely benign). Basically, setting this threshold to a low level in the configuration file will tend to "enrich" your report for the variants that are more likely of pathogenic nature. My understanding is also that regulatory concerns regarding the eligibility to report on relatively frequent variants may exist (at least here in Norway).

Note that the situation for variants found through GWAS, the story is different, they have a much higher population frequency, but with a lower penetrance. The gnomAD allele frequency filtering is not applied to any variant that overlaps with known GWAS hit variants (at least in the dev version (:-)), that should probably be communicated more clearly).

Either way, setting it to a high level will keep all variants in the predisposition genes, but the accumulation of benign variants in the report will consequently tend to rise. I guess it boils down to finding a balance here, and at which threshold this can be achieved.

best, Sigve

vladsavelyev commented 5 years ago

That totally makes sense, really appreciate the detailed explanation! I was trying to hunt several missing MHS variants in hg38 compared to GRCh37 runs (e.g. rs1650697). For some reason the gnomAD frequencies are reported differently, e.g. GRCh37:

GLOBAL_AF_GNOMAD 0
NON_CANCER_AF_GLOBAL 0.8565

And hg38:

GLOBAL_AF_GNOMAD 0.857
NON_CANCER_AF_GLOBAL 0.8565

And because of the missing GLOBAL_AF_GNOMAD, in 37 the variants are kept. Even though the representation of the variant in gnomad_cpsr/gnomad_cpsr.vcf looks identical. Probably not a big deal since in the worst case it just means a bit more clutter, without losing anything important.

Thanks again!

sigven commented 5 years ago

Cutoff's are now only applied for unclassified variants (non-ClinVar).