Closed buniello closed 2 years ago
@buniello and I made another pass to the spec to fill some gaps.
Here, we have used the production API and queries extracted from the production FE to try to explain the changes that we might need. In order to complete all the UI changes specified above, we need to expose the new data that @JarrodBaker has processed.
On all variant page information, there is a query that retrieves metadata about the studies (independently of the query). This static content is currently served through the API. The information required seems to be loaded from v2g_display_labels.json.
API query:
query VariantPageQuery {
genesForVariantSchema {
qtls {
id
sourceId
sourceLabel
sourceDescriptionOverview
sourceDescriptionBreakdown
pmid
tissues {
id
name
}
}
intervals {
sourceId
sourceLabel
sourceDescriptionOverview
sourceDescriptionBreakdown
pmid
tissues {
id
name
}
}
functionalPredictions {
id
sourceId
sourceLabel
sourceDescriptionOverview
sourceDescriptionBreakdown
pmid
tissues {
id
name
}
}
distances {
id
sourceId
sourceLabel
sourceDescriptionOverview
sourceDescriptionBreakdown
pmid
tissues {
id
name
}
}
}
}
API response:
{
"data": {
"genesForVariantSchema": {
"qtls": [
{
"id": "pqtl",
"sourceId": "pqtl",
"sourceLabel": "pQTL (Sun, 2018)",
"sourceDescriptionOverview": "Summary of evidence linking this variant to protein abundance in blood plasma",
"sourceDescriptionBreakdown": "Evidence linking this variant to protein abundance in Sun *et al.* (2018) pQTL data",
"pmid": "PMID:29875488",
"tissues": [
{
"id": "FOLKERSEN_2020-UBERON_0001969",
"name": "Folkersen 2020-uberon 0001969"
},
...
We need to find and complete this data. It's likely to be an input of the data joining step. It might have been completed already.
Where we think the metadata of the studies is stored: https://github.com/opentargets/genetics-api/blob/master/resources/v2g_display_labels.json
The query for the sQTLs seems no different than for other QTLs. We expect data should flow without any extra API changes. For an example with eQTL data see:
query VariantPageQuery {
genesForVariant(variantId: "1_154453788_C_T") {
gene {
id
symbol
}
overallScore
qtls {
sourceId
aggregatedScore
tissues {
tissue {
id
name
}
quantile
beta
pval
}
}
intervals {
sourceId
aggregatedScore
tissues {
tissue {
id
name
}
quantile
score
}
}
functionalPredictions {
sourceId
aggregatedScore
tissues {
tissue {
id
name
}
maxEffectLabel
maxEffectScore
}
}
distances {
typeId
sourceId
aggregatedScore
tissues {
tissue {
id
name
}
distance
score
quantile
}
}
}
}
The response looks like the next, in which IL6R has eQTL information. We are expecting to have the sQTL in a similar format in order to unblock the UI development
I think this info should be enough to start the process but you might need to ping @JarrodBaker and/or @carcruz to resolve specific issues.
Added new record with the metadata about the sQTLs in v2g_display_labels.json - this include text for tooltip.
Discussing with @remo87:
For a fixed Gene/ENSEMBLID, the sQTL results should be aggregated in the same row for different chr_localisation_clusters (curly bracket) -- similar to aggregating for eQTLs in row n1 below:
NB: The API returns separate queries for each Chr_cluster_gene endpoint - the aggregation happens in FE.
phenotypeId": "chr1^54605209^54607134^clu_47022^ENSG00000162390
![Screenshot 2022-07-27 at 10.09.49.png](https://images.zenhubusercontent.com/5ef1da283f096f8317c9ca44/1a7caa1f-6881-45ba-82b5-5ad8a38c561d)
Latest update on this task: @xyg123 is currently investigating whether the prototype shown above covers all use cases for sQTLs. g. does each cluster only map to one gene in our dataset? Can one gene host multiple clusters? If so, how do we display these odd datasets?
From @xyg123: Here are the results for sQTL merging, I think merging it should be fine for now, although we should re-visit this data when we add additional sources in the future.
Showing: only best/most significant cluster within same junction.
Discussed with the team:
the API will be slightly modified so that the sQTL data can conform to the schema.
This means that the current phenotypeId
filed will be split into two columns:
phenotypeId
: ENSGID
spliceId
: chr1^54605209^54607134^clu_47022
Hovering text: log2(H4/H3):, H3:, H4:, QTLbeta:, spliceId
@buniello Just a very minor opinion as a user. What do you think if the tooltip showed the metrics in separate rows instead of having them separated by commas? Btw, for eQTL and pQTLs the metrics are just separated by a space.
@ireneisdoomed this is a good observation! - You suggest having spliceId
in a separate row right?
We have commas also for eQTLs and pQTLs i think? A hovering example from other QTLs is in one of the screenshots above.
@buniello No, I was proposing for the hovering text to be:
log2(H4/H3): {value}
H3: {value}
H4: {value}
QTL beta: {value}
spliceId: {value}
My main argument is that the splice ID is a fairly long string. However, I don't know if with a longer hover text we would frequently obstruct the visibility of other circles.
Discussed with @remo87 already:
Example below of locus page/gene prioritisation coloc table:
As a developer/owner, I should pin down the best way to integrate and display to the users the new splice QTL datasets into the Genetics Portal.
The sQTLs contribute to the new V2G pipeline.
sQTLs in the Gene Page (from Jeremy's ppt)
sQTLs in the Variant Page (from Jeremy's ppt)
[ ] Add as a column in the assigned genes summary
[ ] Add as a tab in the assigned genes
sQTLs in the Study-Locus page (from Jeremy's ppt)
Acceptance tests
How do we know the task is complete?