Closed MattWellie closed 1 year ago
Kicking this back until we decide on whether to use Jinja templating etc.
Re-opening! We can approximate this using a group-by family rather than individual ID. The Family ID is annotated onto the individual variant JSON blobs so this should be accessible to the templating
Currently AIP generates per-sample results, where each affected individual across the whole cohort can be assessed independently (MOI tests are conducted per-family, but the operation is individual-centric). The output format is a dictionary, keyed by the sample IDs at the top level.
This has not been a problem so far, as each family in the development dataset is a trio with a single affected participant. If we have multiple affected persons within a single family, we would expect repetition of results for each participant (under a complete penetrant model if MOI fits for one affected participant, it must fit for all affected participants).
In the JSON and HTML results each sample has a separate variant table, and the cohort-level stats count the number of variants per individual. This is potentially misleading, as we would be double-counting for every family with multiple affected persons.
Seqr links generated are per-family, so the same variant appearing against multiple family members is duplication of the exact same link.
Proposal:
Using the Pedigree, aggregate the results for all members of a family when 'simplifying' the results (removing redundant entries). Instead of presenting per-sample, we should present results per-family.
This requires a bit more thought so as not to break the interface with the comparison process