We need several additional datasets for benchmarking/testing. @jchodera has volunteered to prep these this weekend, so this issue is to get everything all in the same place in order of the priority I would assign them:
Pfizer set. 100 challenging fragments from Pfizer for torsion drives. #50
Genentech set. Optimization dataset as provided, filtering out largest molecules first. Then optimization dataset and torsion drive dataset after fragmentation. #48
DrugBank FDA drugs. DrugBank discussed here would be a good set; I'd focus on the FDA-approved small-molecule drugs and then throw out everything big and everything very small, then fragment for optimization and torsion drives. Probably also remove anything with pentavalent carbon for good measure. Problem: I don't have a DrugBank account yet and it takes two business days for one to be approved, it seems.
Informative set. Optimization dataset of 1117 informative fragments. Discussed in issue #46 . (The larger set includes 9000 compounds which could be fragmented and torsion drives could be done.)
I'm checking into some options on (3) so I might have updates. Or not.
We need several additional datasets for benchmarking/testing. @jchodera has volunteered to prep these this weekend, so this issue is to get everything all in the same place in order of the priority I would assign them:
I'm checking into some options on (3) so I might have updates. Or not.