hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Apache License 2.0
84 stars 21 forks source link

joint caller: implement filters #385

Open timodonnell opened 8 years ago

timodonnell commented 8 years ago

Some filters we should probably have (going by artifacts I've observed in real data)

Some of these are implemented here https://github.com/hammerlab/guacamole/tree/master/src/main/scala/org/hammerlab/guacamole/filters and can be adapted for the joint caller.

For things directly involving base qualities, mapping qualities, and read depths I'd like to try integrating these directly into the likelihood computation when possible rather than having them as filters. For example, instead of having a threshold filter on read depth, it may be possible to parameterize our prior likelihood on there being a somatic variant in a more intuitive way involving minimum read depth.

Here are (some of) the filters Mutect supports: http://www.nature.com/nbt/journal/v31/n3/fig_tab/nbt.2514_T1.html

Since we're joint calling, one question is what sample should these filters run on. I think the simplest thing to do is run them on the pooled sequencing data, but another option to experiment with may be to run them on the samples that trigger a call, and then only keep the call if there are any samples that trigger the call and aren't filtered out.

Any filtered calls should be written out to a VCF with the FILTER field set to explain why it was filtered.

@e5c is down to take a stab at this once we have #384 merged and running on the cluster, assigning to her

timodonnell commented 8 years ago

One of the most important filters to have: filter somatic calls that have evidence in the normal (i.e. cases where the variant is present in the normal at too low a fraction to result in a germline call but too high a fraction to be due to just chance). A possible extension here would be tolerating a parameterized amount of tumor contamination in normal, although can also leave that for later

jstjohn commented 8 years ago

My PR https://github.com/hammerlab/guacamole/pull/394 has the filters you mentioned in the mutect-like somatic caller. Eventually these filters need to be tested and refactored out in such a way that the joint caller can use them easily.

timodonnell commented 8 years ago

Mutect filters / annotations we need (much of this is based on here)

Other annotations

Somewhat helpful diagram from the talk above:

image