opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Update chemical probes to Probes&Drugs v01.2022 #2557

Closed ireneisdoomed closed 2 years ago

ireneisdoomed commented 2 years ago

P&Ds has made a new release and we should therefore rerun the current chemical probes pipeline.

Changes

Based on their release notes:

ireneisdoomed commented 2 years ago

Probes have been regenerated based on the data in the 01.2022 SQL dump. Apart from the changes mentioned above, we are also bringing 2 more probes sets based on their list

24793E22-A5E2-472C-A47D-0D087A7CEE62

However data are not uploaded yet because there is a parsing issue. This a breakdown of the representation of the probes per data source: 22.02 22.04
Open Science Probes 112 130
Probe Miner 3256 3255
Probes & Drugs Portal 93 0
High-quality chemical probes 0 940
opnMe Portal 73 87
Chemical Probes.org (legacy) 602 0
Chemical Probes.org 0 748
Bromodomains chemical toolbox 57 55
Gray Laboratory Probes 88 134
Protein methyltransferases chemical toolbox 28 28
Nature Chemical Biology Probes 51 51
SGC Probes 162 170
JUMP-Target 2 Compound Set 0 72
JUMP-Target 1 Compound Set 0 72
Natural product-based probes and drugs 0 19
Chemical Probes for Understudied Kinases 0 41

“High Quality probes” shouldn’t be considered a source, as these are reported with the flag isQuality. Also “Probes & Drugs Portal” is not represented due to the same reason: these are probes in the high quality set but which are not present in other sources. I’ll get back to this bug right after Easter.

WIP notebook can be found at: https://github.com/opentargets/evidence_datasource_parsers/blob/il-22.04/exploration/chemicalProbes/chemical_probes.ipynb

ireneisdoomed commented 2 years ago

The module has been written and we've been able to generate a dataset based on the P&D newest version without the above mentioned parsing bug.

However there is a bug in the data itself as reported by the ChemicalProbes.org people, which hasn't been solved yet.

I will open the PR with the code as it is now, and we can tackle the data bug in a follow-up ticket.