39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.
Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings.
The dataset csv is 30 MB so will be split in two for upload here.
39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.
Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings.
The dataset csv is 30 MB so will be split in two for upload here.