opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Internal server error in Known drugs widget #3005

Closed prashantuniyal02 closed 1 year ago

prashantuniyal02 commented 1 year ago

Describe the bug Internal server error in the known drug section of a disease profile page

Observed behaviour Uncaught (in promise) ApolloError: Internal server error

To Reproduce Steps to reproduce the behaviour:

  1. Go to Known drug section on https://platform.opentargets.org/disease/MONDO_0004975
  2. Select rows per page : 100
  3. Scroll next to page 501-600 gives an Uncaught (in promise) ApolloError: Internal server error

The GraphQL response seems to be there.

image


Another example:

d0choa commented 1 year ago

@carcruz confirmed that the platform is struggling to resolve at least CHEMBL637 mechanism of action. A similar issue is observed when searching for the id directly.

@ireneisdoomed do you mind having a look at the data? No rush we are not waiting for this issue to be fixed.

d0choa commented 1 year ago

Minimal API example reproducing the issue:

query q{
  drug(chemblId: "CHEMBL637"){
    id
    mechanismsOfAction{
      uniqueActionTypes
    }
  }
}

response:

{
  "data": {
    "drug": {
      "id": "CHEMBL637",
      "mechanismsOfAction": null
    }
  },
  "errors": [
    {
      "message": "Internal server error",
      "path": [
        "drug",
        "mechanismsOfAction"
      ],
      "locations": [
        {
          "line": 4,
          "column": 5
        }
      ]
    }
  ]
}
d0choa commented 1 year ago

The nulls look off:

In [18]: df.filter(f.array_contains(f.col("chemblIds"), "CHEMBL637")).show()
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
|actionType|   mechanismOfAction|           chemblIds|          targetName|    targetType|          targets|          references|
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
| INHIBITOR|Serotonin transpo...|[CHEMBL1201066, C...|Serotonin transpo...|single protein|[ENSG00000108576]|[{DailyMed, [seti...|
|      null|                null|[CHEMBL5095051, C...|                null|          null|             null|[{DailyMed, [seti...|
| INHIBITOR|Norepinephrine tr...|[CHEMBL1201066, C...|Norepinephrine tr...|single protein|[ENSG00000103546]|[{DailyMed, [seti...|
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
ireneisdoomed commented 1 year ago

I thought the problem was a null target, but it is the null mechanismOfAction. Offending cases include 2 more ChEMBLIDs:

-RECORD 0------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 actionType        | null
 mechanismOfAction | null
 chemblIds         | [CHEMBL5095051, CHEMBL637]
 targetName        | null
 targetType        | null
 targets           | null
 references        | [{DailyMed, [setid=e81a2daf-b8b2-7c05-b532-bc775700b100], [https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e81a2daf-b8b2-7c05-b532-bc775700b100]}]
-RECORD 1------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 actionType        | null
 mechanismOfAction | null
 chemblIds         | [CHEMBL272427]
 targetName        | null
 targetType        | null
 targets           | null
 references        | [{DailyMed, [setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d], [https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d]}]

This is coming directly from ChEMBL, we didn't use to have null records. None of them have intricate maxPhases, so I'll report Fiona about it.

d0choa commented 1 year ago

Problem looks to be next door... Screenshot 2023-06-26 at 13 54 16

d0choa commented 1 year ago

Thanks @ireneisdoomed!

FionaEBI commented 1 year ago

Hi Irene / David I had a look at these two cases - for both of them the mechanism is unclear / unknown - so the mechanism of action field is set to null by our curators, but there is a reference provided that explains why- see the dailymed link to the drug label. e.g. https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e81a2daf-b8b2-7c05-b532-bc775700b100 or https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d

Note that one of the three is the parent drug (CHEMBL637,VENLAFAXINE) - so it takes the aggregated data from the child drug form (CHEMBL5095051,VENLAFAXINE BESYLATE) so it shows the same information.

So it’s not a bug - rather it’s a feature of displaying unknown mechanism of action data….

14:16 So it’s not a bug - rather it’s a feature of displaying unknown mechanism of action data….

ireneisdoomed commented 1 year ago

Thank you @FionaEBI for the explanation. As we discussed offline, I think we should revisit the way we handle missing information to have a less error prone representation of the curation. We could leave outside the MoA table records where the target could not be identified (this at the moment is described with empty values and "unknown" values). It is also important to bear in mind that removing the drugs from this table could impact our drugs coverage, as having a MoA is a requisite to be part of our drug index. I expect these cases to be very few. Let's discuss it in the next OT/ChEMBL meeting.

prashantuniyal02 commented 1 year ago

@carcruz

Will try to patch on FE for the time being. (temp fix)

For these 3 Chembl ids, where mechanismOfAction and target details is null, we will hide part of the row that crashes:

carcruz commented 1 year ago

Unfortunately, this cannot (and shouldn't) be handle in the FE. The error comes from the API, this is causing apollo client to crash and, as expected, is stoping the render of elements. This process can not be changed or adjust for an specific query or use case

Response: Screenshot 2023-07-10 at 10 13 37

d0choa commented 1 year ago

We are aiming for a solution in the data for the next release. We need to make sure we don't forget about this @prashantuniyal02

ireneisdoomed commented 1 year ago

After inspection of the mechanism of action dataset with the new rules to filter this data, we have 5,556 records (down from 7042). None of them have:

I consider this done.

d0choa commented 1 year ago

Just double checking. Here you were talking about 3 ChEMBL IDs, but now we are dropping 1k+ moas? Are you sure we are not dropping any worthwhile information?

ireneisdoomed commented 1 year ago

@d0choa Yes, I am. The MOA dataset is now cleaner, not only we dropped the null MOA curation, we also dropped those records where the curated targets/drugs list was empty as this could be error prone.

Explaining the 1488 decrease

Data used:

856 + 632 + 2 - 2 = 1488 (I'm substracting 2 because the null MOA is also in the null targets set)

Consequences to the drug index

Knowing a compound's mechanism of action is one of the requisites to be part of our drug index. With the more constrained MOA dataset, we could see a decrease in the drug index dataset. This is what @tskir reported the other day, the new drug index has 227 less molecules than production (even though the ChEMBL dataset is the same).

Looking at these missing drug IDs, the cause of the decrease is indeed the MOA requirement. If I look at the reported MOA for the missing IDs, I see that in all cases there is no actual MOA information.

In [87]: old_drugs.select("id", "linkedTargets").join(drugs.select("id", "linkedTargets"), on="id", how="left_anti").select("linkedTargets").distinct().s
    ...: how()
+-------------+
|linkedTargets|
+-------------+
|      {[], 0}|
+-------------+

Some examples of drugs gone missing: CHEMBL1591365, CHEMBL1584, CHEMBL1743021. All of them have barely zero information, I'd say we're safe dropping them.

Last thing, I don't like cases like the second one, an approved drug without any curated MOA or indicatio. In this case piperacitazine is a known prodrug used in schizophrenia. In another ticket I'll look for cases of approved drugs where basic curation is needed and report back to ChEMBL.

ireneisdoomed commented 1 year ago

👆 Retraction @d0choa brought to my attention that if we filter the MOAs dataset to exclude those with an empty targets list, we are excluding the annotation of all drugs that attack non human proteins, for example CHEMBL280527. This is not good.

In fact, this is the reason why in most of the cases an Ensembl ID is not present.

Full list of target names where the list of Ensembl IDs is empty
``` +----------------------------------------------------------------+-----+ |targetName |count| +----------------------------------------------------------------+-----+ |Bacterial penicillin-binding protein |80 | |Bacterial 70S ribosome |63 | |Human immunodeficiency virus type 1 reverse transcriptase |33 | |Topoisomerase IV |27 | |Bacterial DNA gyrase |25 | |Bacterial dihydropteroate synthase |23 | |Hepatitis C virus NS5B RNA-dependent RNA polymerase |20 | |Cytochrome P450 51 |18 | |Hepatitis C virus serine protease, NS3/NS4A |17 | |Human immunodeficiency virus type 1 protease |14 | |Fusion glycoprotein F0 |14 | |Spike glycoprotein |13 | |Human herpesvirus 1 DNA polymerase |12 | |Nonstructural protein 5A |10 | |70S ribosome |9 | |Human immunodeficiency virus type 1 integrase |9 | |DNA polymerase/reverse transcriptase |9 | |DNA gyrase |9 | |Envelope glycoprotein |8 | |Penicillin-binding protein |8 | |Dihydrofolate reductase |7 | |Envelope polyprotein GP160 |5 | |Lanosterol 14-alpha demethylase |5 | |Bacterial DNA-directed RNA polymerase |5 | |Bacterial dihydrofolate reductase |5 | |1,3-beta-glucan synthase |5 | |Squalene monooxygenase |5 | |Tubulin |5 | |Bacterial beta-lactamase TEM |5 | |Hemagglutinin |5 | |Reverse transcriptase |5 | |RNA-directed RNA polymerase L |4 | |Toxin A |4 | |Protective antigen |4 | |Neuraminidase |4 | |Glycoprotein |3 | |Toxin B |3 | |Replicase polyprotein 1ab |3 | |VP1 capsid protein |3 | |DNA-directed RNA polymerase |3 | |Enoyl-[acyl-carrier-protein] reductase (FabI) |3 | |Envelope glycoprotein gp160 |2 | |FAD-dependent decaprenylphosphoryl-beta-D-ribofuranose 2-oxidase|2 | |Elongation factor G |2 | |microRNA 21 |2 | |Dihydropteroate synthetase |2 | |DNA polymerase |2 | |Hemocyanin 1 |2 | |Dihydropteroate synthase 1 |2 | |Low calcium response locus protein V |2 | |Enoyl-[acyl-carrier-protein] reductase |2 | |Genome polyprotein |2 | |microRNA-155 |2 | |Thymidylate synthase |2 | |Core antigen |2 | |Nematode GABA-A receptor |2 | |LPA mRNA |2 | |Protein P |2 | |Acetylcholinesterase |2 | |Polymerase acidic protein |2 | |Isoleucyl-tRNA synthetase |2 | |Matrix protein 2 |2 | |Alpha-hemolysin |2 | |VEGF-A mRNA |2 | |Bacterial enoyl-[acyl-carrier-protein] reductase |2 | |1-deoxy-D-xylulose 5-phosphate reductoisomerase, apicoplastic |1 | |Structural capsid protein |1 | |Pyruvate:ferredoxin oxidoreductase |1 | |Acetylcholinesterase 1 |1 | |E protein |1 | |Envelope glycoprotein H |1 | |Carbapenem-hydrolyzing beta-lactamase KPC |1 | |Shiga toxin subunit B |1 | |Cytochrome b |1 | |D-alanylalanine synthetase |1 | |H-Ras mRNA 5'UTR |1 | |Voltage-sensitive sodium channel |1 | |Phosphodiesterase isozyme 4 |1 | |Envelope glycoprotein B |1 | |Enoyl-[acyl-carrier-protein] reductase [NADH] |1 | |Influenza virus A matrix protein M2 |1 | |GABA-A receptor |1 | |Beta-lactamase |1 | |Fatty acid synthase |1 | |Beta-lactamase CTX-M |1 | |Glutamate-gated chloride channel |1 | |Voltage-activated calcium channel beta 1 subunit |1 | |Clumping factor A |1 | |ATP synthase |1 | |Urease |1 | |GP41 |1 | |Helicase primate complex |1 | |Leucine--tRNA ligase |1 | |Cleavage and polyadenylation specificity factor subunit |1 | |Integrase |1 | |Tubulin beta chain |1 | |HMG-CoA reductase |1 | |DNA polymerase catalytic subunit |1 | |DNA topoisomerase IV |1 | |Alanine racemase |1 | |Helicase/primase |1 | |Carbepenem-hydrolyzing beta-lactamase KPC |1 | |Transmembrane glycoprotein gp41 |1 | |Voltage-activated calcium channel beta 2 subunit |1 | |Elongation factor Tu |1 | |Methionine--tRNA ligase |1 | |Envelope phospholipase F13 (p37) |1 | |LPS-assembly protein LptD |1 | |Nucleoprotein |1 | |Bacterial penicillin-binding protein 2 |1 | |Beta-lactamase SHV-1 |1 | |Phosphotransferase pUL97 |1 | |Leucine-tRNA ligase |1 | |Protease |1 | |Nicotinic acetylcholine receptor |1 | |Bacterial urease |1 | |Envelope protein |1 | |Chaperonin GroEL 2 |1 | |Arabinosyltransferase A |1 | |Voltage-sensitive sodium channel alpha-subunit |1 | |RNA-directed RNA polymerase |1 | |Human immunodeficiency virus type 1 Tat protein |1 | |Serine/threonine-protein kinase BGLF4 |1 | |Cytochrome P450 |1 | |P-type ATPase |1 | |UDP-N-acetylglucosamine 1-carboxyvinyltransferase |1 | |Nicotinic acetylcholine receptor alpha subunit |1 | |Heat shock protein 90 homolog |1 | |Botulinum neurotoxin type A |1 | |Protein E7 |1 | |Dihydroorotate dehydrogenase (quinone), mitochondrial |1 | |Polymerase basic protein 2 |1 | |GPI-anchored wall transfer protein 1 |1 | |Dihydroorotate dehydrogenase |1 | |microRNA 122 |1 | |DNA terminase |1 | |Major surface antigen |1 | +----------------------------------------------------------------+-----+ ```

Action: to revert the changes in this PR to only drop those records where targets or targetName are null.

Thank you @d0choa!

ireneisdoomed commented 1 year ago

The MOA dataset with the latest changes has 6,186 records: 5,556 we previously had + 632 we recovered (empty targets) - 2 null MOA = 6,186

We have also gained 56 new drugs, including CHEMBL280527.