Closed mkarmona closed 3 years ago
@d0choa: I have run the drug step with the Chembl28 inputs.
The outputs can be found here: gs://ot-team/jarrod/chembl-28-test
An overview of the outputs:
Counts:
drug: 13076
drug_warnings: 1256
indication: 7427
mechanism_of_action: 5013
Columns for: drug
[
"id",
"canonicalSmiles",
"inchiKey",
"drugType",
"blackBoxWarning",
"name",
"yearOfFirstApproval",
"maximumClinicalTrialPhase",
"parentId",
"hasBeenWithdrawn",
"isApproved",
"withdrawnNotice",
"tradeNames",
"synonyms",
"crossReferences",
"childChemblIds",
"count",
"approvedIndications",
"linkedTargets",
"linkedDiseases",
"description"
]
Columns for: drug_warnings
[
"toxicityClass",
"chemblIds",
"country",
"description",
"id",
"references",
"warningType",
"year",
"meddraSocCode"
]
Columns for: indication
[
"id",
"indications",
"count",
"approvedIndications"
]
Columns for: mechanism_of_action
[
"actionType",
"mechanismOfAction",
"chemblIds",
"targetName",
"targetType",
"targets",
"references"
]
At first glance the counts are correct (have actually increased from last run) and the columns look good. I haven't actually gone through the data manually, relying instead on the metadata output to get some summaries.
Ticket closed as CHEMBL28 data has been ingested into the ETL pipelines and exposed by the GraphQL API (e.g. CHEMBL121 profile page shows the new warnings data introduced in CHEMBL28)
Few things changed from the previous release so better to double-check there is nothing wrong in there therefore the surprise factor is under control. This is an internal preview so the outcome is