Closed MattWellie closed 2 years ago
It was found before that Hail Query encountered pretty huge problems computing compound Hets, even on a small number of variants. The breaking of the current category 4
into two groups could hugely expand the number available at that point in the analysis process:
Category_Support
to better resemble its value in analysisInstead of this, more de novo logic should be completed in Hail https://hail.is/docs/0.2/methods/genetics.html#hail.methods.de_novo
The full MOI validation should be completed in Hail, with Category4 variants labelled only where the MOI has been confirmed
PLAN! Hail implements a de novo function, which will make 2 new inputs mandatory:
The procedure should then be:
Note, the AF table doesn't need to be added as a separate input - once the annotations are added to the MT the annotations can be used self-referentially
Note: this PR is currently blocked by what appears to be a bug in Hail Query (https://hail.zulipchat.com/#narrow/stream/223457-Hail-Batch-support/topic/OutOfMemoryError.20in.20ServiceBackend.2ElowerDistributedSort/near/281636886). Waiting for a response on this from the core Hail team
Currently Cat. 4 is designed to consider high impact in silico variants as well as de novo. This category should be split in the following way:
How will this work?
How will this be substantially different from the current Monoallelic check? It won't be... yet, but de novo and monoallelic will differ in terms of permitted penetrance.
i.e. logic can be adjusted to allow clinvar pathogenic variants to pass despite inheritance from parents, but under all circumstances parental inheritance is disqualifying for a de novo check