Closed DSuveges closed 1 year ago
Exploring the STRING website as of 18 July 2023, the new version is available via UI, API and downloadable formats as well. However the version endpoint on the old and new API urls returns the same data (v.11.5) [1]. This is not consistent with the other endpoints[2], where the returned data is different.
URLS to get the version:
https://version-12-0.string-db.org/api/json/version
https://string-db.org/api/json/version
[
{
"string_version": "11.5",
"stable_address": "https://version-11-5.string-db.org"
}
]
Top interaction partners for TP53:
https://string-db.org/api/json/interaction_partners?identifiers=TP53&limit=100
https://version-12-0.string-db.org/api/json/interaction_partners?identifiers=TP53&limit=100
I let the STRING team know about this inconsistency, waiting for their reply.
What is slightly more concerning is that the current data shown on platform is not consistent with the new or the old STRING data:
+--------------------+--------------------+---------+---------+--------------+
| stringId_A| stringId_B|score_old|score_new|score_platform|
+--------------------+--------------------+---------+---------+--------------+
|9606.ENSP00000269305|9606.ENSP00000340989| 0.999| 0.999| 0.986|
|9606.ENSP00000269305|9606.ENSP00000263253| 0.999| 0.999| 0.999|
|9606.ENSP00000269305|9606.ENSP00000437955| 0.999| 0.999| 0.977|
|9606.ENSP00000269305|9606.ENSP00000362649| 0.999| 0.999| 0.99|
|9606.ENSP00000269305|9606.ENSP00000335153| 0.999| 0.999| 0.99|
|9606.ENSP00000269305|9606.ENSP00000278616| 0.999| 0.999| 0.999|
|9606.ENSP00000269305|9606.ENSP00000356150| 0.999| 0.999| 0.999|
|9606.ENSP00000269305|9606.ENSP00000372023| 0.998| 0.999| 0.998|
|9606.ENSP00000269305|9606.ENSP00000381185| 0.995| 0.999| 0.984|
|9606.ENSP00000269305|9606.ENSP00000365230| NaN| 0.999| NaN|
|9606.ENSP00000269305|9606.ENSP00000354218| NaN| 0.999| NaN|
|9606.ENSP00000269305|9606.ENSP00000266000| NaN| 0.999| NaN|
|9606.ENSP00000269305|9606.ENSP00000212015| 0.999| 0.999| 0.996|
|9606.ENSP00000269305|9606.ENSP00000418960| 0.999| 0.999| 0.996|
|9606.ENSP00000269305|9606.ENSP00000384849| 0.999| 0.999| 0.999|
|9606.ENSP00000269305|9606.ENSP00000341957| 0.999| 0.999| 0.998|
|9606.ENSP00000269305|9606.ENSP00000497594| NaN| 0.999| NaN|
|9606.ENSP00000269305|9606.ENSP00000343535| 0.999| 0.999| 0.987|
|9606.ENSP00000269305|9606.ENSP00000262367| 0.999| 0.999| 0.999|
|9606.ENSP00000269305|9606.ENSP00000258149| 0.999| 0.999| 0.999|
+--------------------+--------------------+---------+---------+--------------+
only showing top 20 rows
By looking at the old ticket about STRING update (#1509 ), it is apparent that at that time, the dataset was not udpated:
detailed_url = 'https://stringdb-static.org/download/protein.links.detailed.v11.0/9606.protein.links.detailed.v11.0.txt.gz'
full_url = 'https://stringdb-static.org/download/protein.links.full.v11.0/9606.protein.links.full.v11.0.txt.gz'
## Joining the two dataset:
merged_df = detailed_df.merge(full_df, on=['protein1', 'protein2'], how='left')
# Saving data:
merged_df.to_csv('9606.protein.links.full_w_homology.v11.5.txt.gz', sep=' ', index=False, compression='infer')
So, somehow the v.11.0
become v.11.5
. I could confirm this by working with the actual v.11.5 data. As I could recapitulate the API response, I could also confirm that the logic is sound, so we can assume, by plugging in the new v.12.0 dataset, it would work fine.
The new dataset for STRING v.12.0 is available here:
gs://ot-team/dsuveges/9606.protein.links.full_w_homology.v12.0.txt.gz
╰─ gsutil cat gs://ot-team/dsuveges/9606.protein.links.full_w_homology.v12.0.txt.gz | gzcat | head -n5 | column -t
protein1 protein2 neighborhood fusion cooccurence coexpression experimental database textmining combined_score homology
9606.ENSP00000000233 9606.ENSP00000356607 0 0 0 45 134 0 0 173 0
9606.ENSP00000000233 9606.ENSP00000427567 0 0 0 0 128 0 0 154 0
9606.ENSP00000000233 9606.ENSP00000253413 0 0 0 118 49 0 0 151 0
9606.ENSP00000000233 9606.ENSP00000493357 0 0 0 56 53 0 433 471 0
vs. v11.5:
╰─ gsutil cat gs://open-targets-data-releases/22.11/input/interactions-inputs/9606.protein.links.full_w_homology.v11.5.txt.gz | gzcat | head -n5 | column -t
protein1 protein2 neighborhood fusion cooccurence coexpression experimental database textmining combined_score homology
9606.ENSP00000000233 9606.ENSP00000272298 0 0 332 62 181 0 125 490 0
9606.ENSP00000000233 9606.ENSP00000253401 0 0 0 0 186 0 56 198 0
9606.ENSP00000000233 9606.ENSP00000401445 0 0 0 0 159 0 0 159 0
9606.ENSP00000000233 9606.ENSP00000418915 0 0 0 61 158 0 542 606 0
protein1 protein2 neighborhood fusion cooccurence coexpression experimental database textmining combined_score homology
9606.ENSP00000269305 9606.ENSP00000340989 0 0 0 0 981 750 859 999 0
{
"stringId_A": "9606.ENSP00000269305",
"stringId_B": "9606.ENSP00000340989",
"preferredName_A": "TP53",
"preferredName_B": "SFN",
"ncbiTaxonId": 9606,
"score": 0.999,
"nscore": 0,
"fscore": 0,
"pscore": 0,
"ascore": 0,
"escore": 0.981,
"dscore": 0.75,
"tscore": 0.859
},
The size of the dataset grew from 11.7 to 13.7M.
PR is opened containing the following changes):
string-interactions.json.gz
is removed as it is not picked up by the ETLA small commit to the ETL was directly pushed reflecting the renamed output file.
The STRING team could resolve the API version discrepancy:
Thanks a lot for the feedback. It's only the "version" API that returns the
same output as v11.5 This is because it is used by R-package, which has to be
checked for compatibility, before full release.
All the rest of the APIs return correct v12 results. I will change it in the
next few days, when I update the R-package.
Important for @HelenaCornu @buniello to bear this in mind for comms.
Thanks @ireneisdoomed! Can I check whether I have understood:
There was an issue which mean that we have been showing an outdated version of String data. However with this update, we are now in line with the latest data, and the size of the dataset has increased by 2M.
@HelenaCornu I would omit the issue and inform that we have integrated the latest data from STRING.
I don't see any release notes on their site, but they have recently published a manuscript based on the latest version. @DSuveges, is there anything in specific we want to highlight? If not I can help prepare some notes based on their publication, depends on how comprehensive we want to be about it.
Hi @LucaFumis, will you be able to update the String version from 11 to 12 in the Molecular Interactions widget on the platform?
Hi @prashantuniyal02, the information in the 4 tabs headers actually comes from our API through this query: https://github.com/opentargets/ot-ui-apps/blob/main/apps/platform/src/sections/target/MolecularInteractions/InteractionsStats.gql#L3
query InteractionsSectionQuery($ensgId: String!) {
interactionResources {
databaseVersion
sourceDatabase
}
target(ensemblId: $ensgId) {
id
intact: interactions(sourceDatabase: "intact") {
count
}
signor: interactions(sourceDatabase: "signor") {
count
}
reactome: interactions(sourceDatabase: "reactome") {
count
}
string: interactions(sourceDatabase: "string") {
count
}
}
}
for TSLP for example the response is:
{
"interactionResources": [
{
"databaseVersion": "11",
"sourceDatabase": "string",
"__typename": "InteractionResources"
},
{
"databaseVersion": "243",
"sourceDatabase": "intact",
"__typename": "InteractionResources"
},
{
"databaseVersion": "81",
"sourceDatabase": "reactome",
"__typename": "InteractionResources"
},
{
"databaseVersion": "Not Available",
"sourceDatabase": "signor",
"__typename": "InteractionResources"
}
],
"target": {
"id": "ENSG00000145777",
"intact": {
"count": 2,
"__typename": "Interactions"
},
"signor": {
"count": 1,
"__typename": "Interactions"
},
"reactome": {
"count": 3,
"__typename": "Interactions"
},
"string": {
"count": 615,
"__typename": "Interactions"
},
"__typename": "Target"
}
}
We would need to update that at API level, unless we want to hardcode that in the front end (which I guess we don't)
@HelenaCornu , @ireneisdoomed I wouldn't go crazy with the coms for this update given we are kind of underutilizing the network data (it's just an annotation at the moment). More over, we are quite underutilizing STRING database itself, as we are only extracting binary interactions with the scores. We can highlight some of the key points from the publication: thank to the improved analytical methods, the number of binary interactions not only grew in size, but also the scores got better.
Updated STRING database citation in the documentation and BE team has updated the STRING table header to version 12.0
It has been advertised on [string-db]() that the new v12.0 version of the resource is available. We need to look into how this update could be automated and
Tasks
v11
->v12.0
. (This info has been off on our website for ever.)Acceptance tests
There's a set of steps to complete to close this ticket: