opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Modulate ChEMBL scoring #1906

Closed ireneisdoomed closed 2 years ago

ireneisdoomed commented 2 years ago

As a result of the work described in #1878, we want to change the scoring for ChEMBL evidence to enrich the information encapsulated in ChEMBL evidence.

Current scoring is based on the phase of the clinical trial

New scoring will modulate the scoring above based on the studyStopReasonCategories

Class Superclass Proposed weight
Insufficient Enrollment Neutral 1
Business or administrative Possibly_Negative 1
Negative Negative 0,5
Logistics or resources Neutral 1
Study design Neutral 1
Invalid reason Invalid_Reason 1
Study staff moved Neutral 1
COVID-19 Neutral 1
Another study Neutral 1
No context Invalid_Reason 1
Safety or side effects Safety_Sideeffects 0,5
Regulatory Neutral 1
Interim analysis Neutral 1
Success Success 1
Met endpoint Neutral 1
Ethical reason Neutral 1
Insufficient data Neutral 1

The impact

image.png *Success is a very small group. Mostly evidence on phase 2 (183) and phase 3 (152).

Tasks

JarrodBaker commented 2 years ago

I can confirm that we can implement the logic in the ETL, I can't confirm that we can do it in in the 22.02 release. Pipeline freeze is today according to release planning.

d0choa commented 2 years ago

@ireneisdoomed I think Success should be a 1 as well. The meaning of the stop reason Success is that the study has stopped because of a successful result. Very often this is done to save time and conclude things early, in order to advance in the clinical pipeline.

I see no reason to prioritise the Success stop reason over a completed study with no stop reason. Also it adds a lot of risks because you could start having scores above 1 (e.g. Phase III + Success)

d0choa commented 2 years ago

@JarrodBaker happy to move this to the COULD bucket in the release intentions. Not urgent cc @ktsirigos

ireneisdoomed commented 2 years ago

@d0choa A completed phase III can fail if the efficacy has not been proven. For example, we are exposing NCT01870778 to account for the relationship between Serelaxin and heart failure. In the results you will see that the p values for the main endpoints do not show efficacy.

This is the rationale behind the distinction.

In any case, you make a very good point, there will be evidence with a score > 1. @JarrodBaker and I just had a chat about it, and in the expression you can indicate that the score this value can be capped to a maximum value as we do for Project Score.

Tbf as I said the impact is minimal. I won't object to keeping it like it is if you think it overcomplicates things.

ireneisdoomed commented 2 years ago

After discussing it with @d0choa, we won't upweight the success records. This is because ChEMBL lacks many records of Phase IV trials, so if we were to upweight these trials, we would give more importance to a "successful" phase III trial than to one that was followed up to a phase IV trial that we have no record of.

I've updated the table above to reflect this.

JarrodBaker commented 2 years ago

@ireneisdoomed In the case of the following record, would the expected score be weighted by a factor of 0.25 or 0.5?

 clinicalStatus            | Terminated                                                                          
 datasourceId              | chembl                                                                              
 datatypeId                | known_drug                                                                          
 diseaseFromSource         | Castrate-Resistant Prostate Cancer                                                  
 diseaseFromSourceMappedId | MONDO_0008315                                                                       
 drugId                    | CHEMBL92                                                                            
 studyStartDate            | 2010-03-01                                                                          
 studyStopReason           | Unable to enroll due to criteria for stable baseline pain                           
 studyStopReasonCategories | [Safety or side effects, Negative]                                                  
 targetFromSource          | CHEMBL2095182                                                                       
 targetFromSourceId        | Q13509                                                                              
 urls                      | [{ClinicalTrials, https://clinicaltrials.gov/search?id=%22NCT01083615%22}]          
 size                      | 2                                                                                   
 s                         | 0.175                                                                               
 e                         | 0.25   

That is, do I just take a minimum of the mapped studyStopReasonCategories or calculate their product to find a weight?

ireneisdoomed commented 2 years ago

@JarrodBaker We just want to use the minimum, so evidence will only be downweighted to a maximum of half.

JarrodBaker commented 2 years ago

Thanks!