Open buniello opened 6 months ago
The widgets above have undergone a series of changes during the implementation process, including removal of the gene
and DoE on trait
columns and revision of sub-header text.
UniProt Variant widget new task:
reported protein
column and add the relevant link it to the summary sub-header (linkout from the source
link).Next in line for implementation are:
In silico predictors widget — First draft based on this schema shared in channel:
├───inSilicoPredictors: array
│ ├───element: struct
│ │ ├───method : string
│ │ ├───assessment : string
│ │ ├───flag : string
│ │ ├───score : float
and sample dataset:
"inSilicoPredictors": [
{
"method": "alphaMissense",
"score": 0.077,
"assessment": "likely_benign"
},
{
"method": "phred scaled CADD",
"score": 7.293
},
{
"method": "sift max",
"score": 0.2,
"assessment": "MODERATE"
},
{
"method": "polyphen max",
"score": 0.069,
"assessment": "tolerated"
},
{
"method": "loftee",
"assessment": "high-confidence LoF variant",
"flag": "PHYLOCSF_WEAK"
}
]
Column 1 : method
e.g. alphaMissense
— COLUMN HEADER: Method — Tooltip: method description (tbd)? (sorting method column alphabetically)
Column 2 : assessment
e.g. likely_benign
— COLUMN HEADER: Prediction — Tooltip: flag
e.g. PHYLOCSF_WEAK
(most severe?)
Column 3: score
e.g. 0.077
. — COLUMN HEADER: Score
NOTE for FE: we could use a colour code for the assessments (varsome some uses a traffic light code). We have done something similar with the pharmacogenetics widget (Confidence level
column) already and we could use same palette.
Some points on the in silico predictors widget discussed when @DSuveges was away:
flag
field bringing crucial info from that single method (loftee?). What if we dropped it - even from tooltip?assessment
data e.g. all strings in lower case, no underscores etcI've picked a few variants in genes known to play a role in drug responses. I think the variant page will be an interesting entry point for doctors/researchers that have observed a specific variant in a patient, so potentially they come without prior knowledge. The intention is to test the UX of these well known variants and see that it's not difficult to interact with the data and get insights.
variant | gene | evidence count |
---|---|---|
rs3892097 | TPMT | 4 |
rs9923231 | VKORC1 | 27 |
rs67376798 | DPYD | 13 |
rs3892097 | CYP2D6 | 3 |
rs4149056 | SLCO1B1 | 159 |
rs4244285 | CYP2C19 | 30 |
In this first iteration, we want to reporduce the current PGx widget without some of the variant metadata columns.
A toy dataset with all evidence (236) for the above variants is here gs://ot-team/irene/variant_page/pgx_30-05-2024.json
root
|-- datasourceId: string (nullable = true)
|-- drugs: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
|-- evidenceLevel: string (nullable = true)
|-- genotypeAnnotationText: string (nullable = true)
|-- genotypeId: string (nullable = true)
|-- isDirectTarget: boolean (nullable = true)
|-- literature: array (nullable = true)
| |-- element: string (containsNull = true)
|-- pgxCategory: string (nullable = true)
|-- phenotypeFromSourceId: string (nullable = true)
|-- phenotypeText: string (nullable = true)
|-- studyId: string (nullable = true)
An important consideration with this data is that the evidence is not indexed by variantId, as other sources. Here we have more granularity, with genotype identifiers. So in order to choose which PGx evidence to show in the variant page, we will be matching the chromosome and position of the variant ID, with the chromosome and position of the genotype ID. For example, for the variant 16_31096368_C_T
, we display all evidence where genotypeId starts with
1631096368`.
In terms of sorting, we want to prioritise the most confident ones (evidence level), and if it is not intricate, I'd suggest showing evidence that report toxicity first (pgxCategory).
The approach to generate the sample data above was not good, I used the files instead of just exporting the response of the API query.
I've generated a very similar dataset extracted from the API: pharmacogenomics_sample.json
🚨Something important: the API for this widget is going to change once the work in #3205 is finished
@gjmcn
Pharmacogenetics widget sample data (see here** for ref)
{"genotypeId": "12_21178615_T_T,T", "isDirectTarget": false, "drugFromSource": "fluvastatin", "drugId": "CHEMBL2220442", "phenotypeFromSourceId": null, "genotypeAnnotationText": "Patients with the rs4149056 TT genotype may have decreased concentrations of fluvastatin as compared to patients with the CC or CT genotypes. However, conflicting evidence has been reported. Other genetic and clinical factors may also affect fluvastatin concentrations. This annotation only covers the pharmacokinetic relationship between rs4149056 and fluvastatin and does not include evidence about clinical outcomes.", "phenotypeText": "decreased concentrations of fluvastatin", "pgxCategory": "metabolism/pk", "evidenceLevel": "1A", "datasourceId": "pharmgkb", "studyId": "1451244700", "literature": ["17015053", "30989645"]}
{"genotypeId": "12_21178615_T_C,T", "isDirectTarget": false, "drugFromSource": "lovastatin", "drugId": "CHEMBL503", "phenotypeFromSourceId": null, "genotypeAnnotationText": "TPatients with the rs4149056 CT genotype may have an increased risk of lovastatin-related myopathy when treated with lovastatin as compared to patients with the TT genotype. Other genetic and clinical factors may also influence risk of toxicity to lovastatin.", "phenotypeText": "increased risk of lovastatin-related myopathy", "pgxCategory": "toxicity", "evidenceLevel": "1A", "datasourceId": "pharmgkb", "studyId": "1451465324", "literature": ["34114646"]}
{"genotypeId": "12_21178615_T_T,T", "isDirectTarget": false, "drugFromSource": "lopinavir", "drugId": "CHEMBL729", "phenotypeFromSourceId": null, "genotypeAnnotationText": "Patients with HIV and the TT genotype may have decreased plasma levels of lopinavir as compared to patients with the CC genotype. However, one study failed to find this association. Other genetic and clinical factors may also influence lopinavir concentrations in a patients. This annotation only covers the pharmacokinetic relationship between rs4149056 and lopinavir and does not include evidence about clinical outcomes.", "phenotypeText": "decreased plasma levels of lopinavir", "pgxCategory": "metabolism/pk", "evidenceLevel": "3", "datasourceId": "pharmgkb", "studyId": "1444704359", "literature": ["20051929", "20078617", "21743379", "23503447", "32022294", "27142945", "28718515"]}
Column 1: genotypeId
e.g. 12_21178615_T_T,T
-- COLUMN HEADER: Genotype ID -- Tooltip on header: [VCF-style(chr_pos_ref_allele1,allele2). See here for more details.]
Column 2: drugFromSource
e.g. fluvastatin
(hyperlink to drugId
e.g. https://platform.opentargets.org/drug/drugId
) -- COLUMN HEADER: Drug(s)
Column 3: phenotypeText
[with tooltip: genotypeAnnotationText
]e.g. decreased concentrations of fluvastatin
-- COLUMN HEADER: Drug Response Phenotype
Column 4: pgxCategory
e.g. metabolism/pk
-- COLUMN HEADER: Drug Response Category
Column 5: isDirectTarget
e.g. false
-- COLUMN HEADER: Direct Drug Target -- see visualisation for this column in current widget
Column 6: evidenceLevel
e.g. 1A
-- COLUMN HEADER: Confidence Level (colour coded) -- Tooltip: As defined by PharmGKB ClinAnn Levels [column with sorting arrow]
Column 7: datasourceId
e.g. pharmgkb
(hyperlinked to studyId
e.g. https://www.pharmgkb.org/clinicalAnnotation/1451244700
) -- COLUMN HEADER: Source
Column 8: Literature
e.g. [17015053, 30989645
] -- COLUMN HEADER: Literature
@gjmcn - let me know if there are questions on this!
Credible sets Widget
Sample dataset used for the table:
{
"variantId": "10_100315722_G_A",
"study": {
"id": "GCST001217",
"traitFromSource": "Metabolic traits",
"disease": {
"id": "EFO_0004725",
"name": "Metabolic traits"
}
},
"pValueMantissa": 3.0,
"pValueExponent": -57,
"beta": 0.124,
"ldPopulationStructure": [
{
"ldPopulation": "nfe",
"relativeSampleSize": 1.0
}
],
"finemappingMethod": "pics",
"l2g": {
"score": 0.36516955494880676,
"target": {
"id": "ENSG00000107593",
"approvedSymbol": "PKD2L1"
}
},
"locus": [
{
"variantId": "10_100315722_G_A",
"r2Overall": 1.0000000000000049,
"posteriorProbability": 1.0,
"standardError": 0.9999989208874888,
"is95CredibleSet": true,
"is99CredibleSet": true
}
]
}
Credible Sets
Column 1: variantId
e.g. 10_100315722_G_A
— Column Header: Lead Variant
NOTE:
variantId
equals fixed variant, it will be followed by (self)
(small chip?) without a linkvariantId
is not fixed variant, the variant will be hyperlinked to its own variant pageColumn 2: From “disease”: name
e.g. Metabolic traits
hyperlinked to Id
e.g. https://platform.opentargets.org/disease/`EFO_0004725` — Column Header: Trait
Column 3: From “study”: id
e.g. GCST001217
hyperlinked to https://www.ebi.ac.uk/gwas/studies/id
— Column Header: Study
NOTE: this row will also open a study metadata drawer in future iteration [metadata drawer including PMID, ancestry, sample size, author name etc tbd]
Column 4: pValueMantissa & pValueExponent
e.g. 3.0-57
— Column Header: P-Value (sorting arrow)
NOTE: table will be sorted by this value
Column 5: beta
e.g. 0.124
— Column header: Beta — Tooltip: Beta with respect to the ALT allele
Column 6: From “locus”: r2Overall
e.g. 1.00
(two decimals figures) — Column Header: LD (r2) — Tooltip: Linkage disequilibrium with the queried variant
Column 7: finemappingMethod
e.g. pics
— Column Header: Finemapping method
Column 8: From “l2g - target": approvedsymbol
e.g. PKD2L1
hyperlinked to [https://platform.opentargets.org/target/id
— Header name: Top L2G — Tooltip: Top gene prioritised by our locus-to-gene model
Column 9: From “l2g”: score
e.g. 0.365
(three decimal figures) — Column Header: L2G score (sorting arrow)
Column 10: From “locus”: number of variant id
fields within the locus
object e.g. 1
for example used in this table —
Column Header: Credible Set Size
NOTE: this row will also a drawer in future iteration [locus drawer with PIP, variants in set, LD etc tbd)
Json file for sample data: test_variant_page7.json
Just adding here a screenshot from the relevant widget in current OTG variant page (for reference)
Looks awesome already. The 2 Columns that have a little bit of magic in my opinion are Column 1 and Column 6. I will give a little longer explanation in case there is any confusion, but I think @buniello description is already good.
Column 1 A credible set (row) will show up on the fixed variant page anytime the variant is part of the (potentially) many members of the locus
array. That means that it might be the lead variant of a given credible set or might not be. In column 1 ("Lead variant"), we want to highlight the lead variant of the credible set so the content will always be variantId
from the object. But depending on whether the variantId
matches the fixed variant at the top of the page, we want to show the 2 different behaviours as described above. The logic is the same as the genetics portal so it can be copied from there, pending further improvements to UI to be discussed (e.g. [self] chip)
Column 6. r2Overall
represents a pairwise relationship between the variantId
(lead) and locus.variantId
(tag). Because the locus
array might contain many objects, we only want to display here the r2Overall
column when fixed variant == locus.variantId
. Out of the many objects we can have in locus
we are only interested in one. No need for additional logic but just for you to understand the data. In the cases in which fixed variant == variantId
, r2Overall
will represent the correlation between the fixed variant with itself. In these cases, the value will always be 1.0
.
@buniello for the next iteration we could decide if we want to collapse Column 8
and Column 9
. Let see how it looks now but I can see some width savings there.
Here's the updated joining process, I thought I should post it since the whole variant index can be annotated this way. I did it for the GWAScat curated PICs results, since those are not going to change anymore going forwards.
I start by generating the credible set + l2g dataframe from joining together the credible_set, locus_to_gene_predictions, study_index and gene_index.
Then I join the variant index to the dataframe above, looking for whenever a variant is found within a locus, and if so, extract its associated pvalue + posteriorprob.
There's a minor inconvenience at the moment where the pValueExponent, Mantissa, and beta columns are not populated for the locus object. This makes sense for the tag SNPs in the PICS output (they didn't have them to start with), it means there's an extra step added to check for the lead, annotate with stats or else fill with null.
To address this comment from @d0choa : "ideally a variant that is sometimes a lead and sometimes a tag. That would help FE consider all cases"
I tried to get SNPs which matched this description in the GWAS sumstats PICS outputs, which again did not have the pvalue fields populated for the tags, so I've switched to the Finngen susie outputs, I took the 1st SNP I saw which matched this criteria:
And ~250 other SNPs incase you need something bigger:
I anticipate that we'll need to go back and forth a few times to refine this, but here's the initial version of the widget. I've made an effort to match the input SNPs to those listed in "test_variant_page9.json". This way, you can create the test variant page incorporating both the credible set widget and the QTL widget.
Also, this file only contains SNPs which are both a lead and sometimes a tag.
{
"variantId": "2_8302417_G_A",
"study": {
"id": "GTEx_brain_putamen_ENST00000668369",
"studyType": "eqtl",
"projectId": "GTEx"
},
"pValueMantissa": 2.359,
"pValueExponent": -8,
"beta": 0.694055,
"posteriorProbability": 0.0248072063095605,
"tissueFromSourceId": "UBERON_0001874",
"target": {
"id": "ENSG00000236790",
"approvedSymbol": "LINC00299"
},
"finemappingMethod": "SuSie",
"locus": [
{
"variantId": "2_8300216_T_C",
"posteriorProbability": 0.0711130529884503,
"pValueMantissa": 1.124,
"pValueExponent": -8,
"logBF": 17.6040572828625,
"beta": 0.642905,
"standardError": 0.106521,
"is95CredibleSet": true,
"is99CredibleSet": true
},
...
}
@xyg123 For the GWAS credible sets widget, I just switched to test_single_variant_page.json
for testing the widget but we seem to have lost the r2Overall
property from the locus
entries?
This is a gap in the current data we are generating. @gjmcn we need to think about this, because it's not trivial to generate this column in some contexts. You can skip the column for now, until we figure out what to do.
@xyg123, @addramir we should think about this. I can see different scenarios. We might not have the R^2 because:
It is because I am using the finngen data for this, the alternative was to use the GWAS catalog PICS output, in which case we will lose the pvalue+beta fields for tag SNPs, happy to generate that if you would prefer
Also from the GWAS credible sets data change: study.disease.name
which we we used for the trait column has gone. Can we use study.traitFromSource
or study.disease.traitFromSource
for the trait column now?
The studies are expected in the same data structure as the GWAS credible sets. @xyg123 is it easy to use the same object?
Just to clarify, my comment about study.disease.name
disappearing is about the GWAS credible sets - it is a result of @xyg123 using a new approach to process the data.
Sorry, it is just a matter of renaming the column from study.disease.traitFromSource
to study.disease.name
. Here you go (same SNPs as QTL widget):
test_credible_set.json
The issue with the r2Overall is due to the different data source (Finngen instead of GWAScatalog), and Finngen doesn't provide the r2 values. I am still processing the data with the same approach.
Addressing @buniello 's request to have tissue labels mapped to the qtl widget test set, there were 32 entries in the test data that didn't match an uberon id:
{
"variantId": "2_8302417_G_A",
"study": {
"id": "GTEx_brain_putamen_ENST00000668369",
"projectId": "GTEx",
"studyType": "eqtl"
},
"pValueMantissa": 2.359,
"pValueExponent": -8,
"beta": 0.694055,
"posteriorProbability": 0.0248072063095605,
"tissue": {
"id": "UBERON_0001874",
"label": "putamen",
"organs": ["brain"],
"anatomicalSystems": ["nervous system"]
},
"target": {
"approvedSymbol": "LINC00299",
"id": "ENSG00000236790"
},
"finemappingMethod": "SuSie",
"locus": [
{
"variantId": "2_8300216_T_C",
"posteriorProbability": 0.0711130529884503,
"pValueMantissa": 1.124,
"pValueExponent": -8,
"logBF": 17.6040572828625,
"beta": 0.642905,
"standardError": 0.106521,
"is95CredibleSet": true,
"is99CredibleSet": true
}, ...
}
In case it's useful for future reference, this is the file that the Platform uses to build the tissue metadata gs://open-targets-data-releases/24.03/input/expression-inputs/tissue-translation-map.json
@gjmcn as discussed in the office, the QTLs credible set widget with almost be a clone of the GWAS credible set one. Below the main differences (which we can discuss tomorrow):
ProjectId
(string) hyperlinked to this page in all cases: https://www.ebi.ac.uk/eqtl/Studies/.
The study metadata card will add more context later.Additional columns
-After STUDY column: studyType
- Column header: TYPE
-After TYPE column: from "tissue" object label
hyperlinked to id
- Column header: TISSUE
Replacing TOP L2G column (for now) with a GENE column (no tooltip) - from "target" object approved symbol
hyperlinked to https://platform.opentargets.org/target/`id'
General notes: shall we run/display top L2G with QTLs? Shall we display logBF
anywhere?
Not sure about displaying the logBF, but we should definitely make it accessible somewhere, users will need it to run colocalisation.
@gjmcn
Discussed changed to current version of QTL credible sets widgets:
Gene
column after type
column (before tissue
)tissue
header to Tissue/Cell
@gjmcn - discussed today
most severe consequence
field of metadata section on variant page will display mostSevereConsequence label
hyperlinked to http://purl.obolibrary.org/obo/`mostSevereConsequenceId` --- please use identifiers to build the right link (see comment below)Please try to use identifiers.org to build the link. You should find the same logic in Open Targets Genetics or ClinVar widgets.
this actually reminds me that we could re-use the VEP chip with variant consequence from the ClinVar widget in VEP widget
@gjmcn: Please note that the new variant index API field hgvsId
should be visualised on variant page subheaded together with rsIds
and dbXrefs
Some updates on the credible set schema:
As part of the Variant Page effort, we have discussed to start developing the first two widgets (sample data has been shared on slack):