Closed JarrodBaker closed 3 years ago
Observed misbehaviour at API-level
lack of inherited MoAs in parent molecule TOFACITINIB (parent molecule) old index API || new index API TOFACITINIB CITRATE (child molecule) old index API || new index API
lack of inherited indications
from parent molecule
TOFACITINIB (parent molecule) old index API || new index API
TOFACITINIB CITRATE (child molecule) old index API || new index API
linkedTargets
available for parent term but null
for child molecule - Opposite behaviour
TOFACITINIB (parent molecule) old index API || new index API
TOFACITINIB CITRATE (child molecule) old index API || new index API
There are going to be some differences in numbers for indications between alpha and beta indexes because of obsolete terms. The example of TOFACITINIB for example returns EFO1001001 and EFO_1000906, where the latter is a replacement for the former. The beta only returns the latter indication.
@d0choa question regarding number 2: do you want to child to inherit all of the parent's indications? In MoA we have entities 'bubbling up' from children up the hierarchy, whereas you seem to be asking here for the opposite, pushing indications down to the children.
The expected behavior should be to show the aggregate of all parent molecule indications/MoA and their descendants in both the parent and the child.
This is the same as saying, we don't have enough granularity to distinguish them, so we show all of them as if they were a cluster of molecules.
Does it make sense?
I don't really understand why we would include the child molecules in the index if they are going to contain the same information as the parent ones. In the case of the example in the ticket if we execute this query on the revised (on branch 1308-inheritance-moa-and-indications) API
query i1308_indication_pc {
drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]){
name
indications {
count
rows {
disease {
id
}
}
}
}
}
We get the following output:
{
"data": {
"drugs": [
{
"name": "TOFACITINIB",
"indications": {
"count": 19,
"rows": [
{
"disease": {
"id": "Orphanet_797"
}
},
{
"disease": {
"id": "EFO_0000546"
}
},
{
"disease": {
"id": "EFO_0000685"
}
},
{
"disease": {
"id": "EFO_0004192"
}
},
{
"disease": {
"id": "EFO_1000906"
}
},
{
"disease": {
"id": "EFO_0002690"
}
},
{
"disease": {
"id": "EFO_0000398"
}
},
{
"disease": {
"id": "EFO_0000384"
}
},
{
"disease": {
"id": "EFO_0003778"
}
},
{
"disease": {
"id": "EFO_0002609"
}
},
{
"disease": {
"id": "EFO_0000676"
}
},
{
"disease": {
"id": "EFO_0003884"
}
},
{
"disease": {
"id": "EFO_0000540"
}
},
{
"disease": {
"id": "EFO_1001494"
}
},
{
"disease": {
"id": "EFO_0003898"
}
},
{
"disease": {
"id": "EFO_0000729"
}
},
{
"disease": {
"id": "EFO_0000274"
}
},
{
"disease": {
"id": "EFO_0000717"
}
},
{
"disease": {
"id": "EFO_0000574"
}
}
]
}
},
{
"name": "TOFACITINIB CITRATE",
"indications": {
"count": 2,
"rows": [
{
"disease": {
"id": "EFO_0000685"
}
},
{
"disease": {
"id": "EFO_0002690"
}
}
]
}
}
]
}
}
Both the parent indications are in the child molecule (since the parent has inherited from all its children), but not all of the parent's are in the child's indications.
Isn't this an indication that they are different?
We want to mimic what ChEMBL does when displaying this information.
The advantage for us on including child molecules is for pharmacovigilance or dictionary-based grounding, clearly not for indications or MoAs. It will also make sense if we increase annotation granularity: solubility, etc.
Summarising here different discussions with @andrewhercules and @JarrodBaker today. Please feel free to expand or clarify if required.
After revising ChEMBL child and parent molecules (e.g. TOFACITINIB and TOFACITINIB CITRATE), it looks like:
While I reached these conclusions based on a handful of cases it would be good if either of you can confirm it in a few other cases.
Our production/old index did instead only capture parent molecules in the index and ignore child molecules. Information on several fields: indications, MoAs, trade names, etc. was propagated up to the parent molecule. This could add some inaccuracies although we don't know the true extent of them.
After evaluating all data and deciding on the best strategy to move forward we would like to:
3 comments on this:
desmoplastic small round cell tumor
is found in the indications for imatinib and not for imatinib mesylate when in fact the drug mentioned in the reference CT is the salt. This has already been reported for clarification.All children are synonymous with the parent, as stated in Synonyms From Parent
, so the lists of children and parent molecules should be uncomplicated to build.
Another thing that is worth noting is that the alpha version currently has two behaviours in relation to children
:
For the new version we have to take into account that at the FE level the search bar must return as Top Hit the child molecule, not the parent as currently suggested. This is the case when the name of the drug is entered, but not the ChEMBL ID.
With the new approach of the beta version we are not inheriting indications, so my understanding is that what we show on the platform is a reflection of ChEMBL's ES, is this correct?
However this is not consistent with what ChEMBL is showing, right? For instance, ChEMBL is not displaying any indications for TRIMETHOPRIM HYDROCHLORIDE and currently we do have an indication here (which is present in the parent).
Personally I do not see the value of listing as an indication a phase 0 CT, but either we are on the road to being consistent with chEMBL or we reflect our own approach.
ChEMBL does seem to intentionally hide phase 0 indications from their FE, but the information is in their index. For example, the TOFACITINIB CITRATE case mentioned above. The ChEMBL index contains indications for systemic lupus erythematosus
(phase 0) and rheumatoid arthritis
(phase IV), whereas their website only lists rheumatoid arthritis
(phase IV).
For the purpose of this ticket, I think it's valuable to keep this information in our index. We can decide in parallel what's the best way to display Phase 0/Early Phase 1
indications on our site.
@d0choa @ireneisdoomed
In relation to max phase: there is a field maxPhaseForIndication
available via the GraphQL API. We can keep all indications in the index and available for people to query programmatically but hide it in the Platform UI if necessary by filtering against that field. If I implement the filter in either the ETL / API it would have the effect of excluding the information altogether.
Example query:
query i1308_indication_pc {
drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
indications {
rows {
maxPhaseForIndication
}
}
}
}
@d0choa On 4 January you wrote
we want to include in the index a list of the ids of all parent molecules and a list of ids with all child molecules. This will allow the FE to list closely related molecules contained in the new drug index.
On the individual Drug entries there are two fields of interest here, parentId
and childChemblIds
(see example query below). If the molecule is a child, parentId
will be null
, if it is a child this field will have a value. Similarly, if a molecule has children the childChemblIds
array will contain a list of further molecules, and if there are no children null
will be returned.
query drug_parent_children {
drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
name
parentId
childChemblIds
}
}
Does this meet the FE's requirements?
It will work. Please consider the next suggestions:
parentId
to parentMolecule
childChemblIds
to childMolecules
(or similar)ChEMBL does seem to intentionally hide phase 0 indications from their FE, but the information is in their index. For example, the TOFACITINIB CITRATE case mentioned above. The ChEMBL index contains indications for
systemic lupus erythematosus
(phase 0) andrheumatoid arthritis
(phase IV), whereas their website only listsrheumatoid arthritis
(phase IV).
@d0choa I don't think this is due to the phase of the indication being 0 because, for IMATINIB MESYLATE, Syckle cell anaemia
is shown as phase 0.
Either way I'll contact them because I don't see the consistency in their data. For this drug, they are listing 63 indications in their ES, and displaying 46 in the website. The differences are present for the parent molecule but the referenced CT also mentions the child so that means that what we are showing is not wrong. I'll keep you posted 🙂
@d0choa Does this look like what you're after?
Query
query i1308_pc_resolve_drug {
drugs(chemblIds: ["CHEMBL221959", "CHEMBL2103743"]) {
name
parentMolecule {
id
}
childMolecules {
id
}
}
}
Response
{
"data": {
"drugs": [
{
"name": "TOFACITINIB",
"parentMolecule": null,
"childMolecules": [
{
"id": "CHEMBL2103743"
}
]
},
{
"name": "TOFACITINIB CITRATE",
"parentMolecule": {
"id": "CHEMBL221959"
},
"childMolecules": []
}
]
}
}
Yes! this looks great. (I'm assuming it's not possible to have more than one parentMolecule)
Presently the data only has either zero or one parent molecules. If that changes it would be coming from the ChEMBL end and it would break the ETL if that happened so we should detect it before it gets to the API.
We have duplicated MoAs. I suspect when parent and child have the same MoA, we end up with the MoA twice.
An example of duplicated MoAs for OSIMERTINIB - CHEMBL3353410 (Astrazeneca's Tagrisso), which has a child molecule OSIMERTINIB MESYLATE (CHEMBL3545063).
Not sure it matters, but the order of the references
is different.
FYI Juan Mosquera from ChEMBL just got back to me saying that the discrepancies between what is displayed in their web interface and what it is contained in the index are caused by a bug on their side. They will fix this for CHEMBL 28.
So we can be safe that whatever we have on our index will be in consonance with them.
It would be good to keep an eye on this for ChEMBL 28. I guess if they populate the molecule index as we are currently doing on our side (all MoAs for the family), everything will work without us changing anything. However, we will have a lot of legacy logic that we might don't want to maintain
work completed.
In the origin drug index indications and mechanisms of action of children ChEMBL molecules were inherited by their parents. When a parent was selected all the MoA and Indications of children were displayed as if they belonged to the parent.
In the drug beta index, molecules at the top of the hierarchy include a field
childChembIds
listing their children. To replicate the behaviour of the original index it is necessary to traverse this hierarchy and collect MoA and Indications of children and present them as belonging to the parent.The desired behaviour of the user interface is to be able to select a molecule and see all of the Indications and MoA on a single page.
Options:
Decision: Option 2
Implementation steps
molecule
,mechanismOfAction
,indications
molecule
with references to all descendants to use for later resolution.