Inheritance of mechanism of action and indications

JarrodBaker commented 3 years ago

In the origin drug index indications and mechanisms of action of children ChEMBL molecules were inherited by their parents. When a parent was selected all the MoA and Indications of children were displayed as if they belonged to the parent.

In the drug beta index, molecules at the top of the hierarchy include a field childChembIds listing their children. To replicate the behaviour of the original index it is necessary to traverse this hierarchy and collect MoA and Indications of children and present them as belonging to the parent.

The desired behaviour of the user interface is to be able to select a molecule and see all of the Indications and MoA on a single page.

Options:

Update the ETL to have the same behaviour as the previous data-pipeline solution
Update the ETL to produce separate datasets for Molecule, Mechanism of Action and Indication and link these as necessary at the front end.
Update the API to traverse the data structure and present the desired view without changing the ETL outputs.

Decision: Option 2

Implementation steps

Update ETL to:
- Output three separate datasets, molecule, mechanismOfAction, indications
- include scripts to load new indexes into Elasticsearch
- include field on molecule with references to all descendants to use for later resolution.
Update API to:
- use new indexes to resolve entities
- update Drug resolver to collect consolidated data sets of MoA and Indication

d0choa commented 3 years ago

Observed misbehaviour at API-level

lack of inherited MoAs in parent molecule TOFACITINIB (parent molecule) old index API || new index API TOFACITINIB CITRATE (child molecule) old index API || new index API
lack of inherited indications from parent molecule TOFACITINIB (parent molecule) old index API || new index API TOFACITINIB CITRATE (child molecule) old index API || new index API
linkedTargets available for parent term but null for child molecule - Opposite behaviour TOFACITINIB (parent molecule) old index API || new index API TOFACITINIB CITRATE (child molecule) old index API || new index API

JarrodBaker commented 3 years ago

There are going to be some differences in numbers for indications between alpha and beta indexes because of obsolete terms. The example of TOFACITINIB for example returns EFO1001001 and EFO_1000906, where the latter is a replacement for the former. The beta only returns the latter indication.

JarrodBaker commented 3 years ago

@d0choa question regarding number 2: do you want to child to inherit all of the parent's indications? In MoA we have entities 'bubbling up' from children up the hierarchy, whereas you seem to be asking here for the opposite, pushing indications down to the children.

d0choa commented 3 years ago

The expected behavior should be to show the aggregate of all parent molecule indications/MoA and their descendants in both the parent and the child.

This is the same as saying, we don't have enough granularity to distinguish them, so we show all of them as if they were a cluster of molecules.

Does it make sense?

JarrodBaker commented 3 years ago

I don't really understand why we would include the child molecules in the index if they are going to contain the same information as the parent ones. In the case of the example in the ticket if we execute this query on the revised (on branch 1308-inheritance-moa-and-indications) API

  query i1308_indication_pc {
    drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]){
      name
      indications {
        count
        rows {
          disease {
            id
          }
        }
      }
    }
  }

We get the following output:

{
  "data": {
    "drugs": [
      {
        "name": "TOFACITINIB",
        "indications": {
          "count": 19,
          "rows": [
            {
              "disease": {
                "id": "Orphanet_797"
              }
            },
            {
              "disease": {
                "id": "EFO_0000546"
              }
            },
            {
              "disease": {
                "id": "EFO_0000685"
              }
            },
            {
              "disease": {
                "id": "EFO_0004192"
              }
            },
            {
              "disease": {
                "id": "EFO_1000906"
              }
            },
            {
              "disease": {
                "id": "EFO_0002690"
              }
            },
            {
              "disease": {
                "id": "EFO_0000398"
              }
            },
            {
              "disease": {
                "id": "EFO_0000384"
              }
            },
            {
              "disease": {
                "id": "EFO_0003778"
              }
            },
            {
              "disease": {
                "id": "EFO_0002609"
              }
            },
            {
              "disease": {
                "id": "EFO_0000676"
              }
            },
            {
              "disease": {
                "id": "EFO_0003884"
              }
            },
            {
              "disease": {
                "id": "EFO_0000540"
              }
            },
            {
              "disease": {
                "id": "EFO_1001494"
              }
            },
            {
              "disease": {
                "id": "EFO_0003898"
              }
            },
            {
              "disease": {
                "id": "EFO_0000729"
              }
            },
            {
              "disease": {
                "id": "EFO_0000274"
              }
            },
            {
              "disease": {
                "id": "EFO_0000717"
              }
            },
            {
              "disease": {
                "id": "EFO_0000574"
              }
            }
          ]
        }
      },
      {
        "name": "TOFACITINIB CITRATE",
        "indications": {
          "count": 2,
          "rows": [
            {
              "disease": {
                "id": "EFO_0000685"
              }
            },
            {
              "disease": {
                "id": "EFO_0002690"
              }
            }
          ]
        }
      }
    ]
  }
}

Both the parent indications are in the child molecule (since the parent has inherited from all its children), but not all of the parent's are in the child's indications.

Isn't this an indication that they are different?

d0choa commented 3 years ago

We want to mimic what ChEMBL does when displaying this information.

The advantage for us on including child molecules is for pharmacovigilance or dictionary-based grounding, clearly not for indications or MoAs. It will also make sense if we increase annotation granularity: solubility, etc.

d0choa commented 3 years ago

Summarising here different discussions with @andrewhercules and @JarrodBaker today. Please feel free to expand or clarify if required.

After revising ChEMBL child and parent molecules (e.g. TOFACITINIB and TOFACITINIB CITRATE), it looks like:

indications in ChEMBL are stored and displayed at the molecule level. Independently of any parenthood relationship
MoAs in ChEMBL are stored at the molecule level, however they are displayed in the ChEMBL website as an aggregation of all the family. Both parents and children display the aggregation of all unique MoAs contained in the family.

While I reached these conclusions based on a handful of cases it would be good if either of you can confirm it in a few other cases.

Our production/old index did instead only capture parent molecules in the index and ignore child molecules. Information on several fields: indications, MoAs, trade names, etc. was propagated up to the parent molecule. This could add some inaccuracies although we don't know the true extent of them.

After evaluating all data and deciding on the best strategy to move forward we would like to:

keep both parent and child molecules in the new index
maintain different indications for parent/child molecules as stored in the index and displayed by the ChEMBL website. This is different from the previous version of the drug index, in which indications of parent molecules were propagated from children to parents.
list the MoAs as an aggregate of the unique family MoAs. Conceptually, all parent and child molecules in ChEMBL should have the same pharmacological action. This action is consistent to what ChEMBL displays on their website.
relatedTargets. This field would be consistent with the MoAs. In fact I think it's a duplication of the same information. We can keep it for compatibility with search, but eventually we could remove it.
we want to include in the index a list of the ids of all parent molecules and a list of ids with all child molecules. This will allow the FE to list closely related molecules contained in the new drug index

ireneisdoomed commented 3 years ago

3 comments on this:

After reviewing several pairs of parent-child molecules (Imatinib/Imatinib Mesylate, Trimethoprim/Trimethorprim Hydrochoride, Metformin/Metformin Hydrochoride) I have come to the same conclusions as outlined by @d0choa above:
- As far as the MoA is concerned, ChEMBL always displays the same annotated targets for both parents and children. This is already the approach we are taking in the alpha version.
- However, indications are listed per molecule. We will be adding this granularity in the beta version. As for the data itself, some inconsistencies have been found in the annotation of the indications, e.g. desmoplastic small round cell tumor is found in the indications for imatinib and not for imatinib mesylate when in fact the drug mentioned in the reference CT is the salt. This has already been reported for clarification.

All children are synonymous with the parent, as stated in Synonyms From Parent, so the lists of children and parent molecules should be uncomplicated to build.

Another thing that is worth noting is that the alpha version currently has two behaviours in relation to children:
- In the API, as expected, a call with a child molecule returns None since the ChEMBL ID is not in the production disease index (example).
- On the web, when you enter a child's ChEMBL ID, the entry of the parent is suggested so that complexity between children and parent is already added somewhere in the pipeline (example).
For the new version we have to take into account that at the FE level the search bar must return as Top Hit the child molecule, not the parent as currently suggested. This is the case when the name of the drug is entered, but not the ChEMBL ID.

ireneisdoomed commented 3 years ago

With the new approach of the beta version we are not inheriting indications, so my understanding is that what we show on the platform is a reflection of ChEMBL's ES, is this correct?

However this is not consistent with what ChEMBL is showing, right? For instance, ChEMBL is not displaying any indications for TRIMETHOPRIM HYDROCHLORIDE and currently we do have an indication here (which is present in the parent).

Personally I do not see the value of listing as an indication a phase 0 CT, but either we are on the road to being consistent with chEMBL or we reflect our own approach.

d0choa commented 3 years ago

ChEMBL does seem to intentionally hide phase 0 indications from their FE, but the information is in their index. For example, the TOFACITINIB CITRATE case mentioned above. The ChEMBL index contains indications for systemic lupus erythematosus (phase 0) and rheumatoid arthritis (phase IV), whereas their website only lists rheumatoid arthritis (phase IV).

For the purpose of this ticket, I think it's valuable to keep this information in our index. We can decide in parallel what's the best way to display Phase 0/Early Phase 1 indications on our site.

JarrodBaker commented 3 years ago

@d0choa @ireneisdoomed In relation to max phase: there is a field maxPhaseForIndication available via the GraphQL API. We can keep all indications in the index and available for people to query programmatically but hide it in the Platform UI if necessary by filtering against that field. If I implement the filter in either the ETL / API it would have the effect of excluding the information altogether.

Example query:

query i1308_indication_pc {
  drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
    indications {
      rows {
        maxPhaseForIndication
      }
    }
  }
}

JarrodBaker commented 3 years ago

@d0choa On 4 January you wrote

we want to include in the index a list of the ids of all parent molecules and a list of ids with all child molecules. This will allow the FE to list closely related molecules contained in the new drug index.

On the individual Drug entries there are two fields of interest here, parentId and childChemblIds (see example query below). If the molecule is a child, parentId will be null, if it is a child this field will have a value. Similarly, if a molecule has children the childChemblIds array will contain a list of further molecules, and if there are no children null will be returned.

query drug_parent_children {
  drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
    name
    parentId
    childChemblIds
  }
}

Does this meet the FE's requirements?

d0choa commented 3 years ago

It will work. Please consider the next suggestions:

rename parentId to parentMolecule
rename childChemblIds to childMolecules (or similar)
make both parents and children fully resolvable entities in the API

ireneisdoomed commented 3 years ago

ChEMBL does seem to intentionally hide phase 0 indications from their FE, but the information is in their index. For example, the TOFACITINIB CITRATE case mentioned above. The ChEMBL index contains indications for systemic lupus erythematosus (phase 0) and rheumatoid arthritis (phase IV), whereas their website only lists rheumatoid arthritis (phase IV).

@d0choa I don't think this is due to the phase of the indication being 0 because, for IMATINIB MESYLATE, Syckle cell anaemia is shown as phase 0.

Either way I'll contact them because I don't see the consistency in their data. For this drug, they are listing 63 indications in their ES, and displaying 46 in the website. The differences are present for the parent molecule but the referenced CT also mentions the child so that means that what we are showing is not wrong. I'll keep you posted 🙂

JarrodBaker commented 3 years ago

@d0choa Does this look like what you're after?

Query

query i1308_pc_resolve_drug {
  drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
    name
    parentMolecule {
      id
    }
    childMolecules {
      id
    }
  }
}

Response

{
  "data": {
    "drugs": [
      {
        "name": "TOFACITINIB",
        "parentMolecule": null,
        "childMolecules": [
          {
            "id": "CHEMBL2103743"
          }
        ]
      },
      {
        "name": "TOFACITINIB CITRATE",
        "parentMolecule": {
          "id": "CHEMBL221959"
        },
        "childMolecules": []
      }
    ]
  }
}

d0choa commented 3 years ago

Yes! this looks great. (I'm assuming it's not possible to have more than one parentMolecule)

JarrodBaker commented 3 years ago

Presently the data only has either zero or one parent molecules. If that changes it would be coming from the ChEMBL end and it would break the ETL if that happened so we should detect it before it gets to the API.

d0choa commented 3 years ago

We have duplicated MoAs. I suspect when parent and child have the same MoA, we end up with the MoA twice.

An example of duplicated MoAs for OSIMERTINIB - CHEMBL3353410 (Astrazeneca's Tagrisso), which has a child molecule OSIMERTINIB MESYLATE (CHEMBL3545063).

Not sure it matters, but the order of the references is different.

ireneisdoomed commented 3 years ago

FYI Juan Mosquera from ChEMBL just got back to me saying that the discrepancies between what is displayed in their web interface and what it is contained in the index are caused by a bug on their side. They will fix this for CHEMBL 28.

So we can be safe that whatever we have on our index will be in consonance with them.

d0choa commented 3 years ago

It would be good to keep an eye on this for ChEMBL 28. I guess if they populate the molecule index as we are currently doing on our side (all MoAs for the family), everything will work without us changing anything. However, we will have a lot of legacy logic that we might don't want to maintain

ktsirigos commented 3 years ago

work completed.

opentargets / issues

Inheritance of mechanism of action and indications #1308