Closed andrewhercules closed 3 years ago
Any progress wrt understanding the cause for this bug?
Hi @sigven!
Our data and technical teams have investigated the issue. Both the data available in the user interface and the associations datasets available for download are correct and valid. However, the difference between them is due to a slightly different algorithm and normalisation and harmonic sum strategy. We expect that the ranking between the user interface and the datasets will be broadly similar, but there will be some differences due to the different algorithms.
We will be harmonising our approach with our next release — 21.06 — scheduled for release at the end of June. This will mean that both the user interface and datasets will provide the same data.
Great @andrewhercules! Thanks for looking into this, highly appreciated. I will use the 21.04 data meanwhile, looking forward to the 21.06 release.
regards, Sigve
Ticket closed as bug has been resolved and new associations files have been generated and made available via FTP and BigQuery
A user has reported that the scores in the
associationByOverallDirect
JSON file does not match with the scores available in the API and presented on the associations page.For example, the overall association score returned by the API for BRAF and Noonan syndrome is 0.85 but the
overallDatasourceHarmonicScore
in the JSON file is0.9781107755829519
part-00186-ecc3d41f-c4e5-42c5-a5a4-b4de41f749d4-c000.json:
Can we please investigate the difference in the score returned by the API and the score available in the data downloads file?
In the meantime, I will respond to the user that we are investigating the discrepancy