opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

L2G widget on credible set page #3603

Open buniello opened 3 weeks ago

buniello commented 3 weeks ago

The API is now ready for FE to build a small L2G (Locus2Gene) widget on the credible set page.

The MVP version of the widget will look like this:

Image

L2G widget Subtext: Genes prioritised by the L2G pipelines within this credible set

HEADER: Gene: target - approvedSymbol hyperlinked to Platform target page (target id). HEADER: L2G score: score (sortable) - Tooltip: Overall evidence linking a gene to this credible set using all features. Score range [0,1]

  1. Discussed with Yakov - we could add a 3rd column PROTEIN IDs: showing a list of proteinIds within a drawer
  2. Visualisation for L2G score column: @chinmehta suggested a small slider [0-1] e.g. progress bar
Example API L2G query for a given credible set ``` query l2gQuery{ credibleSets(studyLocusIds: ["ce063fb868ae77d2cac6b6aaac295c3c"]) { l2Gpredictions { target { proteinIds { id } approvedSymbol id } score } } } ```
Example API L2G response for a given credible set ``` { "data": { "credibleSets": [ { "l2Gpredictions": [ { "target": { "proteinIds": [ { "id": "P68543" }, { "id": "A8K577" }, { "id": "B7ZKP8" }, { "id": "Q569G8" } ], "approvedSymbol": "UBXN2A", "id": "ENSG00000173960" }, "score": 0.8002441662818742 }, { "target": { "proteinIds": [ { "id": "Q9ULI0" }, { "id": "C9J1G9" }, { "id": "C9JG15" }, { "id": "H7BYF1" }, { "id": "B9ZVQ5" }, { "id": "Q6ZNA6" }, { "id": "Q8N9E7" } ], "approvedSymbol": "ATAD2B", "id": "ENSG00000119778" }, "score": 0.35512701343604075 }, { "target": { "proteinIds": [ { "id": "A6NFX1" }, { "id": "A0A2I3JL00" }, { "id": "A0A590UK14" }, { "id": "H7BZN4" }, { "id": "B5MC32" }, { "id": "J3KNU6" } ], "approvedSymbol": "MFSD2B", "id": "ENSG00000205639" }, "score": 0.09205302255863242 }, { "target": { "proteinIds": [], "approvedSymbol": "RN7SL610P", "id": "ENSG00000243847" }, "score": 0.017041815222473916 }, { "target": { "proteinIds": [], "approvedSymbol": "RPS13P4", "id": "ENSG00000238111" }, "score": 0.013550726495908579 }, { "target": { "proteinIds": [], "approvedSymbol": "SDHCP3", "id": "ENSG00000234946" }, "score": 0.013319309036865352 }, { "target": { "proteinIds": [ { "id": "P68106" }, { "id": "F8W6G9" }, { "id": "G5E9U6" }, { "id": "H7C0Y3" }, { "id": "Q13664" }, { "id": "Q16645" }, { "id": "Q53TM2" }, { "id": "Q9BQ40" } ], "approvedSymbol": "FKBP1B", "id": "ENSG00000119782" }, "score": 0.013161329213832501 }, { "target": { "proteinIds": [], "approvedSymbol": "RNU6-370P", "id": "ENSG00000222940" }, "score": 0.010099420713992903 }, { "target": { "proteinIds": [ { "id": "Q8NHR9" }, { "id": "A0A1D5RMN8" }, { "id": "Q53TL9" } ], "approvedSymbol": "PFN4", "id": "ENSG00000176732" }, "score": 0.010004050859147785 }, { "target": { "proteinIds": [ { "id": "Q53FA7" }, { "id": "H7BZH6" }, { "id": "D6W533" }, { "id": "O14679" }, { "id": "O14685" }, { "id": "Q38G78" }, { "id": "Q6JLE7" }, { "id": "Q9BWB8" } ], "approvedSymbol": "TP53I3", "id": "ENSG00000115129" }, "score": 0.009841475098633501 }, { "target": { "proteinIds": [ { "id": "Q9Y3B4" }, { "id": "Q53TM1" } ], "approvedSymbol": "SF3B6", "id": "ENSG00000115128" }, "score": 0.009138952821579635 }, { "target": { "proteinIds": [], "approvedSymbol": "PGAM1P6", "id": "ENSG00000224464" }, "score": 0.009072731781533904 }, { "target": { "proteinIds": [ { "id": "Q9H6R7" }, { "id": "C9JYC1" }, { "id": "ENSP00000295148" }, { "id": "ENSP00000385816" }, { "id": "ENSP00000413426" }, { "id": "D6W532" }, { "id": "Q8IYK0" }, { "id": "Q9HBP5" } ], "approvedSymbol": "WDCP", "id": "ENSG00000163026" }, "score": 0.008726395203073277 }, { "target": { "proteinIds": [ { "id": "P0C875" }, { "id": "A0A087WTY8" }, { "id": "A0A087WVX1" }, { "id": "A0A087WZM6" }, { "id": "A0A087WZN6" }, { "id": "A0A087WZA1" } ], "approvedSymbol": "FAM228B", "id": "ENSG00000219626" }, "score": 0.005988276186022336 }, { "target": { "proteinIds": [ { "id": "Q86W67" }, { "id": "F2Z3J0" }, { "id": "H7C3M9" }, { "id": "H7C4B8" }, { "id": "ENSP00000295150" }, { "id": "ENSP00000401257" }, { "id": "ENSP00000412833" }, { "id": "ENSP00000416595" } ], "approvedSymbol": "FAM228A", "id": "ENSG00000186453" }, "score": 0.004579425400353232 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000232642", "id": "ENSG00000232642" }, "score": 0.004579425400353232 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000283031", "id": "ENSG00000283031" }, "score": 0.004060324114176166 }, { "target": { "proteinIds": [ { "id": "Q96CT2" }, { "id": "H0Y2P5" }, { "id": "Q8N388" }, { "id": "Q96BF0" }, { "id": "Q96PW7" } ], "approvedSymbol": "KLHL29", "id": "ENSG00000119771" }, "score": 0.004049973191037439 }, { "target": { "proteinIds": [ { "id": "Q9NZM3" }, { "id": "E7EPJ2" }, { "id": "F8W719" }, { "id": "H7BZD4" }, { "id": "H7C0L8" }, { "id": "H7C3E2" }, { "id": "O95062" }, { "id": "Q15812" }, { "id": "Q9HAK4" }, { "id": "Q9NXE6" }, { "id": "Q9NYG0" }, { "id": "Q9NZM2" }, { "id": "Q9ULG4" } ], "approvedSymbol": "ITSN2", "id": "ENSG00000198399" }, "score": 0.003406089786462902 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000223754", "id": "ENSG00000223754" }, "score": 0.0033325171349199616 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000242628", "id": "ENSG00000242628" }, "score": 0.0027245465640546875 }, { "target": { "proteinIds": [ { "id": "Q13277" }, { "id": "A0A0C4DGE4" }, { "id": "A0A0J9YW33" }, { "id": "A0A286YF28" }, { "id": "E9PN33" }, { "id": "E9PQJ8" }, { "id": "Q53YE2" }, { "id": "B4DME0" }, { "id": "O43750" }, { "id": "O43751" }, { "id": "Q15360" } ], "approvedSymbol": "STX3", "id": "ENSG00000166900" }, "score": 0.0012349169387285788 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000251805", "id": "ENSG00000251805" }, "score": 0.0010625768387053007 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000233714", "id": "ENSG00000233714" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000286829", "id": "ENSG00000286829" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "LINC02923", "id": "ENSG00000235497" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000224361", "id": "ENSG00000224361" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000223530", "id": "ENSG00000223530" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000223634", "id": "ENSG00000223634" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "HMGN2P20", "id": "ENSG00000232963" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "ENSG00000279526", "id": "ENSG00000279526" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [], "approvedSymbol": "RPL36AP13", "id": "ENSG00000233747" }, "score": 0.0010529482402069192 }, { "target": { "proteinIds": [ { "id": "Q15788" }, { "id": "B5MCN7" }, { "id": "O00150" }, { "id": "O43792" }, { "id": "O43793" }, { "id": "Q13071" }, { "id": "Q13420" }, { "id": "Q2T9G5" }, { "id": "Q53SX3" }, { "id": "Q6GVI5" }, { "id": "Q7KYV3" } ], "approvedSymbol": "NCOA1", "id": "ENSG00000084676" }, "score": 0.0010529482402069192 } ] } ] } } ```
buniello commented 3 weeks ago

Visualisation for L2G score column: @chinmehta suggested a small slider [0-1] e.g. progress bar

d0choa commented 2 weeks ago

The data necessary to build the above specification is available in the current API using the following query:

query VariantsQuery($studyLocusIds: [String!]!) {
  credibleSets(studyLocusIds: $studyLocusIds) {
    studyLocusId
    l2Gpredictions{      
      target{
        id
      }
      score
    }
  }
}

In the query above, a new column named locusToGeneFeatures will soon be added to l2Gpredictions and contain key-value pairs. @jdhayhurst is adding the API column, but the data has already been loaded.

In today's scoping session, @carcruz, @addramir, and I discussed adding features to inform the L2G widget. The overall idea would be to have a simplified heatmap-like representation of the features (derived from the feature matrix). Because the list of features is too long (~30) and represents groups of potentially related features, we would like to have some groupings that would be easier for the user to understand.

The list of grouping suggested by @addramir is:

top category aggregation features
Protein coding max isProteinCoding
Distance to footprint max distanceFootprintMean,distanceSentinelFootprint
Variant effect prediction max vepMaximum, vepMean
Colocalisation max eQtlColocClppMaximum, eQtlColocH4Maximum, pQtlColocClppMaximum, pQtlColocH4Maximum, sQtlColocClppMaximum, sQtlColocH4Maximum

At the moment, all features are:

A nice-to-have would be to show a tooltip on hover with each of the features in each grouping and their color/score.

Two personal thoughts:

@ireneisdoomed / @buniello we might eventually need your thoughts

ireneisdoomed commented 2 weeks ago

Since features are heavily normalised to make sense for the purpose of the model, their actual values are not so informative.

In my view, it'd be cool to have a visualisation of how that score has been constructed - although it is more complex to implement. I tinkered with the possibilities provided by SHAP here: https://github.com/opentargets/gentropy/pull/886#issuecomment-2450184422 Shapley values are theoretically additive, so we could add aggregate them into groups of features as proposed and visualise them.

d0choa commented 2 weeks ago

I also think we are off with the specification. My main concern is that a given feature value (e.g. 0.5) means nothing about it's predictive power and it might mislead the user. Let's think a bit more. The waterfall looks amazing