Splice AI! - Githubissues

We would like to include splice prediction as one of the facets when determining if a variant is plausibly relevant to diagnosis. This will be a little more complicated than just adding a new category, as the current filters will remove variants without high-consequence impacts annotated against them. Discussion:

How to implement
- we could add a SpliceAI Delta threshold to the existing Support category labelling. This would treat the annotations like all the other in silico tools, and passing the threshold test would mean that variants could support other 'full category' variants in compound het. formations.
- we could add SpliceAI as a completely new class of 'interesting', i.e. category_5. This would be a category whose sole purpose is to highlight variants with compelling in silico splice prediction results. This would be treated as a full category, i.e. het. variants with this category assigned could make it into the report.
- we could run each of these implementations, and see whether there is a huge impact in terms of final results.
- we could use both of these implementations, e.g. running with a strict threshold as a primary category, and a more lenient category as a supporting variant
Considerations
- how many variants will be added? This should be the question for some exploratory work on a representative cohort. We don't want to add a vast number of variants into the final report, but we also don't want to set filters so strictly that we are missing plausible results
- impact of these variants may be hard to confirm, so should we always assess them? We could have a toggle whereby the configuration sets whether we assess SpliceAI results, or we simply default the SpliceAI category flag to always be 0. Downstream nothing will change, it will just be as if we had never run the test. This toggle could be used if a cohort doesn't have any RNAseq data to accompany the results, if the cohort-clinicians have no interest in the results, if splicing is not an expected MOP (Mode Of Pathogenicity?) for the genes being assessed, or if SpliceAI results are not supplied by the relevant annotation source
Thresholds
- happy to open up to the floor on this one. To a certain extent the threshold used will be dictated by the number of results produced, but any literature/experience indicating a high-specificity threshold would be appreciated
Re-Working
- most variants flagged only by their SpliceAI scores will currently be filtered out of the analysis, as we are heavily filtering for green genes, then high impact consequences relevant to those genes. To implement a VEP-consequence independent category, we would have to alter the workflow:
  1. perform some hard filtering (e.g. pop. AF)
  2. run annotation of splice category assignment for each variant
  3. each subsequent filtering step would be dependent on the existing criteria or spliceAI flag assigned
- are we still filtering to genes on a strict list? (yes, currently)
- can we remove some of the current filtering steps by making the category assignment more strict? e.g. instead of filtering out low consequence annotations, running category assignment, then removing variants without assigned categories... can we just make the category assignment more specific, taking the consequences into account in a firmer positive selection event? We currently have several layers of filtering in order to reduce the search space for the category assignment... I have not proven that this benefits the run-time at all, and removing those pre-filtration steps could improve performance overall, though the category logic would need to be updated to take that into account

How to implement: I quite like the idea of running both using different thresholds. Eg. anything above 0.5 would be category_5 and anything between 0.2-0.5 could be support. Though perhaps tweaking those thresholds if we need to.
Considerations: I'm not sure that I know enough context to fully understand your second point. Splicing will be a possible MOP for any gene because a splicing variant may result in a frameshift, an insertion of a PTC, and/or in-frame indels with similar outcomes to missense changes. You're right though, that you cannot be sure which of these impact/s will result from a splicing variant without performing functional studies first. I have a few examples that I can show you where functional studies have revealed splicing variants that result in multiple different outcomes with the impact being incredibly difficult to interpret. So for that reason, I don't think the AIP should try to assign an impact but I think the result should still be reported so it can be followed up. Eg. my previous group, run by Sandra Cooper, invite clinicians and scientists to submit candidate splicing variants to aid in the interpretation of these variants and they are getting better at predicting the impact. Also, with enough other lines of evidence you can sometimes get these splicing variants over the line without functional studies, so reporting regardless I think is useful.
Thresholds From experience I want to say 0.2, or even 0.1 would be a low threshold, with 0.5-0.8 as a high threshold. Though I feel like you'd miss a lot of variants with a threshold of 0.8. From memory I think 0.2, 0.5, and 0.8 are the thresholds recommended in the original paper which is why everyone generally refers to one of those three? Happy to help go through example datasets and see how many variants they return.
Re-Working Not entirely sure I understand all the steps currently involved here. What do you mean about "run annotation of splice category assignment for each variant"?

For an initial implementation I suggest we add a distinct test category and start with an initial conservative configurable threshold of say 0.5. I also like the idea of adding a more permissive sAI threshold to the Support category but that should probably be extracted into a separate conversation/evaluation. For the moment we should focus on the identification of very high signal-to-noise predictions that are likely to result in LOF.

Depending on what we see when we run this at scale we may need to investigate some additional transcript-based filtering (ie do we need to limit this category to a more strictly defined set of transcripts to control noise?) and/or complement it with predicted consequence filters but lets deal with that if and when we see the problem.

@SamBryen Not very well explained on my part; I was dumping thoughts without fully explaining!

WRT #2, PanelApp results have a slot where a specific MOI can be documented. I was thinking there may be a situation where the MOI for a specific disease could be Loss-of-function doesn't cause this, but splice variants could equally cause activation or LoF so that's probably not a fruitful discussion...

WRT #4, this is slightly more technical, so happy to discuss in person. The current algorithm does a lot of progressive filtering, removing all variants without high impact consequences, removing all annotations not on 'Green' genes, etc., and only then positively assigns categories in a smaller search space. That isn't compatible with splice variants, as we're not limited to exonic, or even genic (depending on how good spliceAI is at finding promoter/enhancer sites?). The whole flow will need alteration to retain variants with these new types of consequence. It's all well and good missing out pathogenic variants because they don't pass the tests, but it looks pretty rubbish to have missed them entirely because they were blindly filtered out 🙃

One remedy would be to run splice consequence testing earlier, before those genic/consequence filters, then when filtering all downstream tests change from keep everything with high VEP consequences to keep high VEP consequences or high SpliceAI. This point was thrown in to flag that this won't actually be a simple modification 😬.

Ah right that makes sense. Yes you would want to run it before the consequence filters to grab synonymous changes and extended splice site variants.

Ok! there's a draft currently on #50

SpliceAI annotations as currently applied consist of a String consequence with the highest Delta (if any, so values could be No Consequence, or Acceptor/Donor Gain/Loss), and the corresponding Delta scores. Threshold values here relate to the Delta score annotation.

Changes (that I can remember off hand):

Added 2 fields in the config file (spliceai_full, spliceai_support) which hold float values for the spliceAI Delta. Initially both 0.5
- spliceai_support is a place holder for a lower threshold contributing to the support category, but that's not yet implemented
filtering out any variants with clinvar benign is removed, and instead Category 1 explicitly tests that clinvar annotation doesn't contain ~Benign. This means that de novo/splice variants which are Clinvar benign could still be flagged... I think this is defensible as each category should be based on completely independent information, but happy to re-visit.
global consequence filtering is removed, so instead of 'only keep variants with VEP HIGH consequences' we instead move the consequence testing into each category test. That makes category testing a positive-selection event instead of the result of attritional negative selection.
- the wrinkle here was that the de novo category included no additional consequence filtering, so each trio ended up with ~30 de novo intronic_variants being selected. We apply consequence filtering to variants as part of the de novo test.
SpliceAI consequence is set at >= 0.5, with no other attributes tested. Happy to combine this with a more stringent MAF, gnomad AC/Hom count... etc.

populationgenomics / automated-interpretation-pipeline

Splice AI! #44