populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

Potential SV Bug #386

Closed MattWellie closed 1 week ago

MattWellie commented 1 week ago

category labelling - SV1 = gene is predicted LOF in a green gene.

We aren't currently filtering to only green genes in this analysis. We may be selecting a variant on the basis that it has a green gene consequence (+ 100 more genes), then we split this out into 100 rows. Those rows may each be picked up during the MOI testing phase.

This is probably not an issue, but check

MattWellie commented 1 week ago

I think I've handled this appropriately:

  1. first split sortedTranscriptConsequences so each gene and consequence is on a separate row
  2. each row's gene_id is the ID from the single transcript consequence
  3. then apply SV1 (transcript consequence = LOF, gene_id = Green)

No VCF rows exist unless the gene is LOF and Green, and the variant's predicted_lof field contains all gene symbols from the annotation stage, even if they weren't Green (presented in results/HTML, but not influential in MOI decisions)