Add DREAM synth1 benchmark and variants from VCF

arahuja commented 8 years ago

This PR has a few pieces:

a benchmark based on DREAM synth1
loading 'known' variants from a VCF file
a gcloud configuration (any ideas on how to deal with a reference path here? https://github.com/hammerlab/guacamole/issues/512 could be a fix)
some py3 fixes (iteritems and 'w' in tmpfiles)

arahuja commented 8 years ago

Not sure if the results make sense for this, I didn't set any parameters, but the summary files gives:

stat,comparison_dataset,numerator,denominator,percent
calls,,1377,,
calls before filtering,,302336,,
calls,published,3537,,
recall with filters,published,50,3537.0,1.413627367825841
precision with filters,published,50,3537.0,1.413627367825841
recall from pooled calling only without filters,published,3508,3537.0,99.18009612666101
recall individual calling only without filters,published,3508,3537.0,99.18009612666101
precision without filters,published,3508,302336.0,1.1602984758679085
precision both pooled and individual triggers firing,published,3508,302335.0,1.1603023136586899

timodonnell commented 8 years ago

Thanks @arahuja . I made a few comments but otherwise this LGTM. From your summary it does look like we way over calling here, will be good to dig into what's happening, may be good motivation for writing a few more filters in joint caller

hammerlab / variant-calling-benchmarks

Add DREAM synth1 benchmark and variants from VCF #26