Closed mkarmona closed 5 months ago
I explored the same data to get some sense of the coverage.
9597
1436
577
As @mkarmona raised, beware the .contains("DRA")
as Meddra can have multiple spellings in EFO.
I raised a ticket in EFO to increase Xref coverage https://github.com/EBISPOT/efo/issues/978
Adding my code for future reference:
meddraXrefs = (disease
.select(col("id").alias("diseaseId"), explode("dbXRefs").alias("xref"))
.withColumn("source", split(col("xref"), ":").getItem(0))
.withColumn("meddraCode", split(col("xref"), ":").getItem(1))
.drop("xref")
.filter(col("source").contains("DRA"))
)
mapped = (
fda
.select("meddraCode")
.distinct()
.join(meddraXrefs, on = "meddraCode", how = "left")
)
``
@d0choa yes it could be more using like or starts with instead contains
I raised it with EFO and they will try to clean it as part of https://github.com/EBISPOT/efo/issues/878
@mbdebian did you consider this when working on the openFDA pipeline? No rush but it will be a nice to have at some point
I didn't take this one into account, when I did the OpenFDA work, I'll have a look at the data and the code to find out how it could be made possible.
@jdhayhurst , would mind checking with the data team whether this issue is still relevant? Thanks!
@jdhayhurst @tskir to check whether the latest changes in the internal pipelines may remove meddra terms altogether from the processes
The recent changes we made were regarding the meddra SOC codes, which were removed - ticket 3003. I'm not sure if, prior to that, work was done that makes this ticket irrelevant or not. Please could someone from the @opentargets/data-team advise if there's still work to be done here? Thanks!
I think this is still relevant. In #3003, we removed the reference to MedDRA because ChEMBL improved the mapping of the drug warnings data from reporting generalm MedDRA code to re-curate the adverse event to a more specific EFO code. That made the MedDRA annotation practically uninformative.
This ticket is about the Pharmacovigilance data, which is related but not the same. Here we have the most significant adverse events for a drug as reported by doctors. These events is not a curated list of adverse events like in the previous case. In Pharmacovigilance, to report adverse events people often use MedDRA to define ADRs. This is what we currently show in the ticket. 10% of EFO IDs have a cross reference with MedDRA like this one here.
{
"data": {
"disease": {
"id": "EFO_0000222",
"name": "acute myeloid leukemia",
"dbXRefs": [
# more here
"MedDRA:10000880",
# more here
]
}
}
}
What we propose here is to expand the pharmacovigilance dataset so that we can establish a link between the MedDRA and an EFO ID.
@JarrodBaker, using the recently incorporated cross-ref from disease entities as a way to match and annotate each MedDRA ID with its corresponding EFO ID so the @opentargets/frontend could use the optionally resolved disease entity. Some base code to work with if helps
You could potentially address this issue when you integrate the OpenFDA pipeline with the main ETL in the issue #1416