open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
219 stars 55 forks source link

39k Pfizer dataset #188

Closed bdeadman closed 2 months ago

bdeadman commented 2 months ago

39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.

Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings.

The dataset csv is 30 MB so will be split in two for upload here.

bdeadman commented 2 months ago

data and generator notebook.zip