Generate drug index with the new chembl release 28

mkarmona commented 3 years ago

Few things changed from the previous release so better to double-check there is nothing wrong in there therefore the surprise factor is under control. This is an internal preview so the outcome is

[ ] a GCS folder with the produced drug index

JarrodBaker commented 3 years ago

@d0choa: I have run the drug step with the Chembl28 inputs.

The outputs can be found here: gs://ot-team/jarrod/chembl-28-test

An overview of the outputs:

Counts: 

drug: 13076
drug_warnings: 1256
indication: 7427
mechanism_of_action: 5013

Columns for: drug
[
  "id",
  "canonicalSmiles",
  "inchiKey",
  "drugType",
  "blackBoxWarning",
  "name",
  "yearOfFirstApproval",
  "maximumClinicalTrialPhase",
  "parentId",
  "hasBeenWithdrawn",
  "isApproved",
  "withdrawnNotice",
  "tradeNames",
  "synonyms",
  "crossReferences",
  "childChemblIds",
  "count",
  "approvedIndications",
  "linkedTargets",
  "linkedDiseases",
  "description"
]

Columns for: drug_warnings
[
  "toxicityClass",
  "chemblIds",
  "country",
  "description",
  "id",
  "references",
  "warningType",
  "year",
  "meddraSocCode"
]

Columns for: indication
[
  "id",
  "indications",
  "count",
  "approvedIndications"
]

Columns for: mechanism_of_action
[
  "actionType",
  "mechanismOfAction",
  "chemblIds",
  "targetName",
  "targetType",
  "targets",
  "references"
]

At first glance the counts are correct (have actually increased from last run) and the columns look good. I haven't actually gone through the data manually, relying instead on the metadata output to get some summaries.

andrewhercules commented 3 years ago

Ticket closed as CHEMBL28 data has been ingested into the ETL pipelines and exposed by the GraphQL API (e.g. CHEMBL121 profile page shows the new warnings data introduced in CHEMBL28)

opentargets / issues

Generate drug index with the new chembl release 28 #1460