Difference between the Clinical Trials Pipeline and ChEMBL

opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal

https://platform.opentargets.org https://genetics.opentargets.org

Apache License 2.0

12 stars 2 forks source link

Difference between the Clinical Trials Pipeline and ChEMBL #3230

Open inessmit opened 9 months ago

inessmit commented 9 months ago

See the discussion below:

Original issue

Describe the bug The documentation about Clinical Precedence on OT platform does not agree with ChEMBL phases, particularly phase -1. https://platform-docs.opentargets.org/drug/clinical-precedence

There have been some recent changes to the ChEMBL phases as described here: https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_32/chembl_32_release_notes.txt

Observed behaviour OT documentation says phase -1 is Preclinical, this is incorrect, as per ChEMBL -1 means 'Unknown'. https://chembl.gitbook.io/chembl-interface-documentation/frequently-asked-questions/drug-and-compound-questions#what-is-max-phase

Looking at the data, the platform contains such 'Unknown' compounds .e.g. https://platform.opentargets.org/drug/CHEMBL2104462 and in data download this has ind_max_phase -1, so the data download agrees with ChEMBL, just the documentation doesn't agree.

Not checked about compounds with phase 'NULL'/preclinical - not aware of them being in the platform.

Expected behaviour

[x] Documentation to match the data and ChEMBL documentation.

inessmit commented 8 months ago

Related to the above, there is no section on the 'Indications' widget in the documentation.

On the drug page (https://platform.opentargets.org/drug/CHEMBL672) there is a section on 'Indications' and 'Clinical Precedence' which contain different data (ChEMBL vs Clinical Trials pipeline). The difference is not explained in the documentation, which only includes a section on Clinical Precedence (https://platform-docs.opentargets.org/drug/clinical-precedence).

Try searching for Hypertriglyceridemia on the two different widgets on the https://platform.opentargets.org/drug/CHEMBL672 page, and see the confusing results.

ireneisdoomed commented 8 months ago

The reason for the inconsistency is because the data is coming from 2 different sources and aggregated in 2 different levels.

Clinical precedence. This table is derived from the target/disease evidence set ChEMBL collects from clinical trials. It is a more detailed view because it shows trial specific information by aggregating on the condition, the clinical phase, and the status of the study. This data was introduced earlier, before we downloaded data from ChEMBL directly.
Indications. This mimics the Drug Indications table in ChEMBL. It aggregates data by condition and highlights which is the most advanced clinical trial phase per indication. This data was introduced when we expanded the drug page.

Given the similarities between the 2, I think we all agree that having these 2 streams is really confusing. @buniello @d0choa, I vote for deprecating the Clinical Precedence view. What do you think? If you agree, I can create a separate ticket to better scope the task.

d0choa commented 8 months ago

It's not that simple. @prashantuniyal02 has some examples of inconsistencies between the 2 that I'm not able to explain. We might need some feedback from ChEMBL.

I agree having the 2 streams is confusing (particularly on the drug/molecule page). But Clinical Precedence contains the breakdown of Clinical Trials (e.g. by phase) which we don't have in the indications in which only the Max Phase is reported. The best would be access to a 3rd dataset that corresponds to the indication data + clinical trial data before the mechanism of action is considered. This was @polrus's point in the previous monthly meeting.

inessmit commented 8 months ago

Fiona told me the Clinical Trials pipeline includes some records that are not included in ChEMBL, namely clinical trials for approved drugs being tested for indications other than the approved one. These are marked as phase IV on ClinicalTrials.gov (because the drug is approved), but ChEMBL doesn't include these at all (because they think marking them as phase IV is confusing since that indication listed is not approved), but she told me OT has wanted this info in its completeness. She said they're considering including these records though and thinking of a mechanism to do so while not labelling them as phase IV, so definitely worth coordinating with Fiona.

ireneisdoomed commented 8 months ago

The best would be access to a 3rd dataset that corresponds to the indication data + clinical trial data before the mechanism of action is considered.

@d0choa Correct me if I'm wrong but we have this information already. The indications dataset comprehenwds all drug/indication relationships, not just for the drugs where the target is known. See Cominarty, for example.

The problem with just using this dataset for both widgets is that it is incomplete. And part of the reason is what @inessmit is mentioning: the Clinical Trials pipeline include extra drug/disease associations that are not covered in the indications. 20% of drug/disease pairs from clinical trials are not covered in the indications dataset. But this not only affects the issue of dealing with phase IV trials of conditions for which the drug hasn't been approved, the example with fenofibrate shows that we have missing examples of Phase IV CTs in the indications dataset.

In my opinion, I'd rather have missing (that will gradually increase) than incoherent information. And putting them next to the other is not helpful either :D

I understand that there is a difference in the aggregation, but I'm not sure we need 2 widgets so I suggest reviewing this. We could have a single table called Clinical Precedence that is very similar to the Indications one only that if you click on Source you see the breakdown per clinical trial.

ireneisdoomed commented 8 months ago

She said they're considering including these records though and thinking of a mechanism to do so while not labelling them as phase IV, so definitely worth coordinating with @FionaEBI.

Thank you @inessmit. We'll comment the details with Fiona. In my opinion, I'd omit encoding the meaning of a clinical phase because it is something that everyone understands. If there are cases where the presence of a Phase IV doesn't mean an approval for that indication, I would create a specific isApproved field and not be based on the maximum phase. I think 2 fields in combination would be better to understand the clinical status of a drug.

FionaEBI commented 8 months ago

Hi Irene, Yes, this issue has been discussed several times over the past few years at e.g. the OT-ChEMBL catchup meeting. I’m hoping to include the Phase IV trials in ChEMBL_36 (ie Spring 2025), but we need to have internal discussions within ChEMBL first to agree on what/how it should be shown.

An example was given by Ines - FENOFIBRATE, but there are lots of others too that are in Phase IV clinical trials for a different indication from the indication that the drug has been approved for. Some work that Paula did last spring to examine the issue showed that there are around 9000 drug-indication pairs for Phase IV clinical trials that are excluded from ChEMBL currently (but are delivered to OT via the Clinical Trials Pipeline).

e.g. FENOFIBRATE has been approved for (e.g. has a FDA approval and therefore has a DailyMed medicinal product label): See Drug_Indications section of https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL672/

Dyslipidemias
Hypothyroidism
Diabetes Mellitus
Cardiovascular Diseases
Hypercholesterolemia
Diabetes Mellitus, Type 2

BUT it is in Phase IV clinical trials for these indications (https://platform.opentargets.org/drug/CHEMBL672):

burn
coronary artery disease
diabetic retinopathy
Hypertriglyceridemia
Disorder of lipid metabolism etc

Equally, OXCARBAZEPINE is approved for Epilepsy (EFO:0000474) & Seizure (HP:0001250). See https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1068/ But OXCARBAZEPINE (https://platform.opentargets.org/drug/CHEMBL1068) has Phase IV clinical trials for : -MONDO:0007295 (childhood epilepsy with centrotemporal spikes) NCT03490487 -EFO:0000289 (bipolar disorder) NCT03567681 -EFO:0000289 (bipolar disorder) NCT00154323 -EFO:0000289 (bipolar disorder) NCT01893229 -EFO:0000289 (bipolar disorder) NCT02456896 -EFO:0004216 (conduct disorder) NCT00154362 -EFO:1001219 (trigeminal neuralgia) NCT04996199 -MONDO:0004979 (asthma) NCT00142025 -EFO:1001219 (trigeminal neuralgia) NCT03374709

In the OT platform, my understanding is that the Indications section comes via the ChEMBL API, but the Clinical Precedence section comes via the Clinical Trials Pipeline that ChEMBL delivers to OT for each OT release, hence the differences between the Phase IV Clinical Trials that are captured by the Clinical Trials Pipeline, and the data in ChEMBL that excludes Phase IV Clinical Trials data unless it is for an approved indication.

ireneisdoomed commented 8 months ago

Thank you for the examples @FionaEBI !

Regarding the first example, I don't know where the list of indications is coming from, but fibrates are one of the main treatments for hypertriglyceridemia so I think we should have it as an indication for fenofibrate.

The oxcarbazepine example is trickier. For the case of asthma, I don't really understand why there is only a Phase IV study with 55 patients without having proved the efficacy first. The bipolar disorder example is similar, so I'm missing something.

In any case, since the indications dataset is not constrained to approved indications, is there a problem if the indications dataset included these Phase IV records?

FionaEBI commented 8 months ago

Thank you for the examples @FionaEBI !

Regarding the first example, I don't know where the list of indications is coming from, but fibrates are one of the main treatments for hypertriglyceridemia so I think we should have it as an indication for fenofibrate.

The indications are typically manually curated based on the information in a drug label - you can consult the link to the drug label which is cited as evidence. However, there are many drug labels for each approved drug, and typically the indication has been curated when the drug was first included in ChEMBL. Note that a drug label may later be revised, and additional drug labels will be created for each new medicinal product that contains the same active ingredient. So I agree that there may be missing indications that could be curated. A further point is that each indication is mapped to a MeSH term, and the MeSH terms are mapped to an EFO_id, so if there wasn't an appropriate MeSH term available at the appropriate granularity, then it may not have been able to be mapped.

The oxcarbazepine example is trickier. For the case of asthma, I don't really understand why there is only a Phase IV study with 55 patients without having proved the efficacy first. The bipolar disorder example is similar, so I'm missing something.

The data comes from Clinical Trials who mark the Trial as Phase IV because the drug has been approved (for a different indication) - I don't think that this means that the drug has necessarily been tested for asthma at Early Phase 1, Phase, 1, Phase 2, Phase 3. The Clinical Trial Phases demonstrate safety, side effects etc but not necessarily efficacy see https://www.cancer.gov/publications/dictionaries/cancer-terms/def/phase-i-clinical-trial

In any case, since the indications dataset is not constrained to approved indications, is there a problem if the indications dataset included these Phase IV records?

The key issue for ChEMBL is that it is confusing to our data users if we mark a drug as max_phase=4 (ie approved) for an indication for which it does not have regulatory approval by e.g. EMA or FDA etc. We changed the numerical categories for ChEMBL_32, and we need to revisit the process to come up with something appropriate for Phase 4 Clinical Trials that is different from an approved drug with max_phase=4. Open Targets took a different approach, and requested that all indication data be included - hence the current difference between the Clinical Trials Pipeline and ChEMBL.

prashantuniyal02 commented 7 months ago

[ ] In the Clinical Precedence widget: change from 'Source: ChEMBL' to 'Source: Clinical Trails Pipeline'

ireneisdoomed commented 7 months ago

@prashantuniyal02 I wouldn't change the reference unless there is a link to link to. This pipeline is not public, so we don't have a direct ChEMBL reference that we could use. The best alternative I can think of is linking to our documentation https://platform-docs.opentargets.org/drug/clinical-precedence (and extend this documentation to explain data provenance)