Closed prashantuniyal02 closed 1 year ago
@carcruz confirmed that the platform is struggling to resolve at least CHEMBL637
mechanism of action. A similar issue is observed when searching for the id directly.
@ireneisdoomed do you mind having a look at the data? No rush we are not waiting for this issue to be fixed.
Minimal API example reproducing the issue:
query q{
drug(chemblId: "CHEMBL637"){
id
mechanismsOfAction{
uniqueActionTypes
}
}
}
response:
{
"data": {
"drug": {
"id": "CHEMBL637",
"mechanismsOfAction": null
}
},
"errors": [
{
"message": "Internal server error",
"path": [
"drug",
"mechanismsOfAction"
],
"locations": [
{
"line": 4,
"column": 5
}
]
}
]
}
The nulls
look off:
In [18]: df.filter(f.array_contains(f.col("chemblIds"), "CHEMBL637")).show()
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
|actionType| mechanismOfAction| chemblIds| targetName| targetType| targets| references|
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
| INHIBITOR|Serotonin transpo...|[CHEMBL1201066, C...|Serotonin transpo...|single protein|[ENSG00000108576]|[{DailyMed, [seti...|
| null| null|[CHEMBL5095051, C...| null| null| null|[{DailyMed, [seti...|
| INHIBITOR|Norepinephrine tr...|[CHEMBL1201066, C...|Norepinephrine tr...|single protein|[ENSG00000103546]|[{DailyMed, [seti...|
+----------+--------------------+--------------------+--------------------+--------------+-----------------+--------------------+
I thought the problem was a null target, but it is the null mechanismOfAction
. Offending cases include 2 more ChEMBLIDs:
-RECORD 0------------------------------------------------------------------------------------------------------------------------------------------------------------------------
actionType | null
mechanismOfAction | null
chemblIds | [CHEMBL5095051, CHEMBL637]
targetName | null
targetType | null
targets | null
references | [{DailyMed, [setid=e81a2daf-b8b2-7c05-b532-bc775700b100], [https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e81a2daf-b8b2-7c05-b532-bc775700b100]}]
-RECORD 1------------------------------------------------------------------------------------------------------------------------------------------------------------------------
actionType | null
mechanismOfAction | null
chemblIds | [CHEMBL272427]
targetName | null
targetType | null
targets | null
references | [{DailyMed, [setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d], [https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d]}]
This is coming directly from ChEMBL, we didn't use to have null records. None of them have intricate maxPhases, so I'll report Fiona about it.
Problem looks to be next door...
Thanks @ireneisdoomed!
Hi Irene / David I had a look at these two cases - for both of them the mechanism is unclear / unknown - so the mechanism of action field is set to null by our curators, but there is a reference provided that explains why- see the dailymed link to the drug label. e.g. https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e81a2daf-b8b2-7c05-b532-bc775700b100 or https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=126747c4-39f3-4e20-8f3c-7b8596d8ba7d
Note that one of the three is the parent drug (CHEMBL637,VENLAFAXINE) - so it takes the aggregated data from the child drug form (CHEMBL5095051,VENLAFAXINE BESYLATE) so it shows the same information.
So it’s not a bug - rather it’s a feature of displaying unknown mechanism of action data….
14:16 So it’s not a bug - rather it’s a feature of displaying unknown mechanism of action data….
Thank you @FionaEBI for the explanation. As we discussed offline, I think we should revisit the way we handle missing information to have a less error prone representation of the curation. We could leave outside the MoA table records where the target could not be identified (this at the moment is described with empty values and "unknown" values). It is also important to bear in mind that removing the drugs from this table could impact our drugs coverage, as having a MoA is a requisite to be part of our drug index. I expect these cases to be very few. Let's discuss it in the next OT/ChEMBL meeting.
@carcruz
Will try to patch on FE for the time being. (temp fix)
For these 3 Chembl ids, where mechanismOfAction and target details is null, we will hide part of the row that crashes:
Unfortunately, this cannot (and shouldn't) be handle in the FE. The error comes from the API, this is causing apollo client to crash and, as expected, is stoping the render of elements. This process can not be changed or adjust for an specific query or use case
Response:
We are aiming for a solution in the data for the next release. We need to make sure we don't forget about this @prashantuniyal02
After inspection of the mechanism of action dataset with the new rules to filter this data, we have 5,556 records (down from 7042). None of them have:
mechanismOfAction
I consider this done.
Just double checking. Here you were talking about 3 ChEMBL IDs, but now we are dropping 1k+ moas? Are you sure we are not dropping any worthwhile information?
@d0choa Yes, I am. The MOA dataset is now cleaner, not only we dropped the null MOA curation, we also dropped those records where the curated targets/drugs list was empty as this could be error prone.
Data used:
http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest/output/etl/parquet/mechanismOfAction/
gs://open-targets-pre-data-releases/23.09/output/etl/parquet/mechanismOfAction
old_moa.filter(f.size("targets") == 0).show(1, False, True)
-RECORD 0-----------------------------------------------------------------------------------------------------------------------------------
actionType | INHIBITOR
mechanismOfAction | Human immunodeficiency virus type 1 reverse transcriptase inhibitor
chemblIds | [CHEMBL280527]
targetName | Human immunodeficiency virus type 1 reverse transcriptase
targetType | single protein
targets | []
references | [{PubMed, [8913487, 9763407], [http://europepmc.org/abstract/MED/8913487, http://europepmc.org/abstract/MED/9763407]}]
old_moa.filter(f.col("targets").isNull()).show(1, False, True)
-RECORD 0----------------------------------------------------------------------------------------------------------------------------------
actionType | INHIBITOR
mechanismOfAction | Peptidoglycan inhibitor
chemblIds | [CHEMBL3301650, CHEMBL3301669]
targetName | null
targetType | null
targets | null
references | [{FDA, [label/2014/021883s000lbl.pdf], [http://www.accessdata.fda.gov/drugsatfda_docs/label/2014/021883s000lbl.pdf]}]
old_moa.filter(f.col("mechanismOfAction").isNull()).show(1, False, True)
-RECORD 0------------------------------------------------------------------------------------------------------------------------------------------------------------------------
actionType | null
mechanismOfAction | null
chemblIds | [CHEMBL5095051, CHEMBL637]
targetName | null
targetType | null
targets | null
references | [{DailyMed, [setid=e81a2daf-b8b2-7c05-b532-bc775700b100], [https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e81a2daf-b8b2-7c05-b532-bc775700b100]}]
In [64]: old_moa.filter((f.size("chemblIds") == 0) | (f.col("chemblIds").isNull())).count()
Out[64]: 0
856 + 632 + 2 - 2 = 1488 (I'm substracting 2 because the null MOA is also in the null targets set)
Knowing a compound's mechanism of action is one of the requisites to be part of our drug index. With the more constrained MOA dataset, we could see a decrease in the drug index dataset. This is what @tskir reported the other day, the new drug index has 227 less molecules than production (even though the ChEMBL dataset is the same).
Looking at these missing drug IDs, the cause of the decrease is indeed the MOA requirement. If I look at the reported MOA for the missing IDs, I see that in all cases there is no actual MOA information.
In [87]: old_drugs.select("id", "linkedTargets").join(drugs.select("id", "linkedTargets"), on="id", how="left_anti").select("linkedTargets").distinct().s
...: how()
+-------------+
|linkedTargets|
+-------------+
| {[], 0}|
+-------------+
Some examples of drugs gone missing: CHEMBL1591365, CHEMBL1584, CHEMBL1743021. All of them have barely zero information, I'd say we're safe dropping them.
Last thing, I don't like cases like the second one, an approved drug without any curated MOA or indicatio. In this case piperacitazine is a known prodrug used in schizophrenia. In another ticket I'll look for cases of approved drugs where basic curation is needed and report back to ChEMBL.
👆 Retraction @d0choa brought to my attention that if we filter the MOAs dataset to exclude those with an empty targets list, we are excluding the annotation of all drugs that attack non human proteins, for example CHEMBL280527. This is not good.
In fact, this is the reason why in most of the cases an Ensembl ID is not present.
Action: to revert the changes in this PR to only drop those records where targets
or targetName
are null.
Thank you @d0choa!
The MOA dataset with the latest changes has 6,186 records: 5,556 we previously had + 632 we recovered (empty targets) - 2 null MOA = 6,186
We have also gained 56 new drugs, including CHEMBL280527.
Describe the bug Internal server error in the known drug section of a disease profile page
Observed behaviour
Uncaught (in promise) ApolloError: Internal server error
To Reproduce Steps to reproduce the behaviour:
Uncaught (in promise) ApolloError: Internal server error
The GraphQL response seems to be there.
Another example: