opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

annotate each meddra term with an optional EFO ID #1415

Closed mkarmona closed 5 months ago

mkarmona commented 3 years ago

@JarrodBaker, using the recently incorporated cross-ref from disease entities as a way to match and annotate each MedDRA ID with its corresponding EFO ID so the @opentargets/frontend could use the optionally resolved disease entity. Some base code to work with if helps

logger.info("compute xref from efo disease to meddra")
val med2efo = diseases.selectExpr("id", "dbXRefs")
    .withColumn("xref",explode($"dbXRefs"))
    .filter(col("xref").contains("DRA"))
    .withColumn("meddraCode", element_at(split($"xref", ":"), 2))
    .selectExpr("id as diseaseId", "meddraCode")
val faersEFOLeft = faers.join(med2efo, Seq("meddraCode"), "left_outer").withColumnRenamed("chembl_id", "chemblId")
// TODO select just the needed columns or clean med2efo before
...

You could potentially address this issue when you integrate the OpenFDA pipeline with the main ETL in the issue #1416

d0choa commented 3 years ago

I explored the same data to get some sense of the coverage.

As @mkarmona raised, beware the .contains("DRA") as Meddra can have multiple spellings in EFO. I raised a ticket in EFO to increase Xref coverage https://github.com/EBISPOT/efo/issues/978

Adding my code for future reference:


meddraXrefs = (disease
 .select(col("id").alias("diseaseId"), explode("dbXRefs").alias("xref"))
 .withColumn("source", split(col("xref"), ":").getItem(0))
 .withColumn("meddraCode", split(col("xref"), ":").getItem(1))
 .drop("xref")
 .filter(col("source").contains("DRA"))
)

mapped = (
    fda
    .select("meddraCode")
    .distinct()
    .join(meddraXrefs, on = "meddraCode", how = "left")
)
``
mkarmona commented 3 years ago

@d0choa yes it could be more using like or starts with instead contains

d0choa commented 3 years ago

I raised it with EFO and they will try to clean it as part of https://github.com/EBISPOT/efo/issues/878

d0choa commented 2 years ago

1533 increased the coverage of EFO - Meddra mappings

d0choa commented 2 years ago

@mbdebian did you consider this when working on the openFDA pipeline? No rush but it will be a nice to have at some point

mbdebian commented 2 years ago

I didn't take this one into account, when I did the OpenFDA work, I'll have a look at the data and the code to find out how it could be made possible.

mbdebian commented 9 months ago

@jdhayhurst , would mind checking with the data team whether this issue is still relevant? Thanks!

mbdebian commented 8 months ago

@jdhayhurst @tskir to check whether the latest changes in the internal pipelines may remove meddra terms altogether from the processes

jdhayhurst commented 8 months ago

The recent changes we made were regarding the meddra SOC codes, which were removed - ticket 3003. I'm not sure if, prior to that, work was done that makes this ticket irrelevant or not. Please could someone from the @opentargets/data-team advise if there's still work to be done here? Thanks!

ireneisdoomed commented 5 months ago

I think this is still relevant. In #3003, we removed the reference to MedDRA because ChEMBL improved the mapping of the drug warnings data from reporting generalm MedDRA code to re-curate the adverse event to a more specific EFO code. That made the MedDRA annotation practically uninformative.

This ticket is about the Pharmacovigilance data, which is related but not the same. Here we have the most significant adverse events for a drug as reported by doctors. These events is not a curated list of adverse events like in the previous case. In Pharmacovigilance, to report adverse events people often use MedDRA to define ADRs. This is what we currently show in the ticket. 10% of EFO IDs have a cross reference with MedDRA like this one here.

{
  "data": {
    "disease": {
      "id": "EFO_0000222",
      "name": "acute myeloid leukemia",
      "dbXRefs": [
        # more here
        "MedDRA:10000880",
        # more here
      ]
    }
  }
}

What we propose here is to expand the pharmacovigilance dataset so that we can establish a link between the MedDRA and an EFO ID.