Modulate ChEMBL scoring

ireneisdoomed commented 2 years ago

As a result of the work described in #1878, we want to change the scoring for ChEMBL evidence to enrich the information encapsulated in ChEMBL evidence.

Current scoring is based on the phase of the clinical trial

Phase 0: 0,09
Phase 1: 0,1
Phase 2: 0,2
Phase 3: 0,7
Phase 4: 1

New scoring will modulate the scoring above based on the `studyStopReasonCategories`

Class	Superclass	Proposed weight
Insufficient Enrollment	Neutral	1
Business or administrative	Possibly_Negative	1
Negative	Negative	0,5
Logistics or resources	Neutral	1
Study design	Neutral	1
Invalid reason	Invalid_Reason	1
Study staff moved	Neutral	1
COVID-19	Neutral	1
Another study	Neutral	1
No context	Invalid_Reason	1
Safety or side effects	Safety_Sideeffects	0,5
Regulatory	Neutral	1
Interim analysis	Neutral	1
Success	Success	1
Met endpoint	Neutral	1
Ethical reason	Neutral	1
Insufficient data	Neutral	1

As discussed with the data team, we only want to penalise the evidence whose CT has been stopped early due to Negative or Safety reasons.
For the case of those tagged with Success, we want to upweight them to make the distinction between a CT that has been completed without being confident of whether the results have been positive or negative, and those for which we are certain about their success.

The impact

*Success is a very small group. Mostly evidence on phase 2 (183) and phase 3 (152).

3905 associations (of 69960) will be affected by the downweighting (Negative or Safety effects).
- The mean score for this is ~0,28 that will be reduced to half, which makes sense since the most are Phase 2. This means they are not of great relevance for the prioritisation.
369 associations (of 69960) will be affected by the upweighting (Success).
- The mean score for this is 0,48 that will be multiplied by 2. The change is significant although the affected evidence is minimal.

Tasks

[ ] Update the expression in the ETL config file to reflect this change. @JarrodBaker, can you confirm that this can be done in the ETL with an UDF?

JarrodBaker commented 2 years ago

I can confirm that we can implement the logic in the ETL, I can't confirm that we can do it in in the 22.02 release. Pipeline freeze is today according to release planning.

d0choa commented 2 years ago

@ireneisdoomed I think Success should be a 1 as well. The meaning of the stop reason Success is that the study has stopped because of a successful result. Very often this is done to save time and conclude things early, in order to advance in the clinical pipeline.

I see no reason to prioritise the Success stop reason over a completed study with no stop reason. Also it adds a lot of risks because you could start having scores above 1 (e.g. Phase III + Success)

d0choa commented 2 years ago

@JarrodBaker happy to move this to the COULD bucket in the release intentions. Not urgent cc @ktsirigos

ireneisdoomed commented 2 years ago

@d0choa A completed phase III can fail if the efficacy has not been proven. For example, we are exposing NCT01870778 to account for the relationship between Serelaxin and heart failure. In the results you will see that the p values for the main endpoints do not show efficacy.

This is the rationale behind the distinction.

In any case, you make a very good point, there will be evidence with a score > 1. @JarrodBaker and I just had a chat about it, and in the expression you can indicate that the score this value can be capped to a maximum value as we do for Project Score.

Tbf as I said the impact is minimal. I won't object to keeping it like it is if you think it overcomplicates things.

ireneisdoomed commented 2 years ago

After discussing it with @d0choa, we won't upweight the success records. This is because ChEMBL lacks many records of Phase IV trials, so if we were to upweight these trials, we would give more importance to a "successful" phase III trial than to one that was followed up to a phase IV trial that we have no record of.

I've updated the table above to reflect this.

JarrodBaker commented 2 years ago

@ireneisdoomed In the case of the following record, would the expected score be weighted by a factor of 0.25 or 0.5?

 clinicalStatus            | Terminated                                                                          
 datasourceId              | chembl                                                                              
 datatypeId                | known_drug                                                                          
 diseaseFromSource         | Castrate-Resistant Prostate Cancer                                                  
 diseaseFromSourceMappedId | MONDO_0008315                                                                       
 drugId                    | CHEMBL92                                                                            
 studyStartDate            | 2010-03-01                                                                          
 studyStopReason           | Unable to enroll due to criteria for stable baseline pain                           
 studyStopReasonCategories | [Safety or side effects, Negative]                                                  
 targetFromSource          | CHEMBL2095182                                                                       
 targetFromSourceId        | Q13509                                                                              
 urls                      | [{ClinicalTrials, https://clinicaltrials.gov/search?id=%22NCT01083615%22}]          
 size                      | 2                                                                                   
 s                         | 0.175                                                                               
 e                         | 0.25

That is, do I just take a minimum of the mapped studyStopReasonCategories or calculate their product to find a weight?

ireneisdoomed commented 2 years ago

@JarrodBaker We just want to use the minimum, so evidence will only be downweighted to a maximum of half.

JarrodBaker commented 2 years ago

Thanks!

opentargets / issues