mskcc / tempo

CCS research pipeline to process WES and WGS TN pairs
12 stars 5 forks source link

Generate PoN for somatic SVs? #125

Open kpjonsson opened 5 years ago

kpjonsson commented 5 years ago

See #126, but for SVs. This is in addition to filtering against gnomAD.

evanbiederstedt commented 5 years ago

It's something to try, but I'm not sure if we have a very good sense of this:

gnomAD-SV paper: https://www.biorxiv.org/content/10.1101/578674v1

My current thought is that this will require a great deal of investigation by CCS before we implement something.


EDIT:

The larger question is the following: how do you precisely implement such a filter?

Let's consider SNVs. Let's say we're confident (based on the panel of normals) that a germline SNP at position X is a germline SNP, and should not be considered a somatic SNP. Ergo, it should be filtered. That's a pretty easy problem.

Now let's consider SVs. SVs are denoted by two breakpoints and an event annotation. Based on a hypothetical "SV PoN", let's say we were 100% confident that an SV which is an INS at bkpt1=X and bkpt2=Y is a common germline INS, and therefore could not be categorized as a somatic SV.

The outputs from SV callers will almost never show that. The variant could be mislabeled, or the breakpoints at are bkpt1=X+1 and bkpt2=Y-1. Even though the variant is of a different label and has a different location, it could very well be the same event. Would that mean creating filters which look for events +/- a given number of basepairs? You might remove signal then. That's no good. Flag it? That might be better, but given the quality of SV callers, virtually all events need to be flagged.

But then the question is, how do you incorporate the inherent "awfulness" of standard short-read SV callers? If you create filters which filter exactly for the events such as "INS at bkpt1=X and bkpt2=Y", I'm pretty sure you'll find nothing. Best case scenario could be a few correctly filtered event, worse case scenario would be a massive mess.

I'm also not convinced the world has analyzed a high enough sample size to say that a given SV is a "standard" SV (with the exception of a handful of cases), but that's another kettle of fish**.

**kettle of fish for the fiskgryta and fisksoppa, of course

evanbiederstedt commented 5 years ago

We'll need to devote science time to this @kpjonsson I'll de-prioritize for now