open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
219 stars 55 forks source link

39k Pfizer dataset (#188) #189

Closed bdeadman closed 2 months ago

bdeadman commented 2 months ago

39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.

Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings. data and generator notebook.zip

github-actions[bot] commented 2 months ago
Change summary: Filename Added Removed Changed
data/complete_dataset.pbtxt.gz 0 0 0
0 0 0
bdeadman commented 2 months ago

@skearnes @connorcoley @qai222 The Pfizer 39k dataset is ready for review.

skearnes commented 2 months ago

Thanks @bdeadman!

github-actions[bot] commented 2 months ago
Change summary: Filename Added Removed Changed
data/d9/ord_dataset-d92976309c3a48a3a64a4cf5e7048086.pb.gz 39347 0 0
39347 0 0
github-actions[bot] commented 2 months ago
Change summary: Filename Added Removed Changed
data/d9/ord_dataset-d92976309c3a48a3a64a4cf5e7048086.pb.gz 39347 0 0
39347 0 0
skearnes commented 2 months ago

@bdeadman I just realized that the dataset name and description are empty; can you submit a PR to update them?