populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

During Dev, run as both family- and singleton-analysis #32

Closed MattWellie closed 1 year ago

MattWellie commented 2 years ago

During this development period, it's important to gauge performance of the tool when exposed to both family and singleton cohorts, as both are anticipated use-cases once deployed.

e.g. Singleton analysis will create more False-positives, due to the lack of control family members, but De Novo variants can only be found in a trio/family cohort.

To simulate this, run each test cohort (here just Acute Care) in both different ways, through manipulating the PEDigree used.

31 contains a script for generating a PED from the Sample Metadata API, including a flag to set all members to unrelated singletons

Alterations required (non-mandatory, so original flow is unaffected):

  1. Allow for a second PED to be supplied, representing Singletons only
  2. If supplied, run the Labelling process twice, once with each PED
    • should not need to manually limit any flagging method, e.g. de novo can run but will be ineffective
  3. Run the analysis process twice, each time using one PED
  4. Label results of these twin processes appropriately
MattWellie commented 2 years ago

Note - this can be done now, but it will make more sense (and be a cleaner base to build on) once the de novo branch is merged. Currently that is held up due to a memory leak problem in Hail Query, which we are looking to mitigate

MattWellie commented 2 years ago

Hail Query issue solved, de novo branch is available for review. The HTML report generation is stacked on top of that.

The only difference in Hail between running as family and singleton is the presence/absence of de novo results. That doesn't seem like enough reason to duplicate the intermediate files (and expense of generating/storing them). Instead, will create a flag in the final processing to remove the de novo field for all variants if we choose to run as singleton analysis.

Will still need both a family and singleton version of the Ped/Fam file to ensure the MOI tests are run appropriately

MattWellie commented 1 year ago

If two PED files are supplied, this will run in both singleton and familial mode