Closed bdeadman closed 4 months ago
Change summary: | Filename | Added | Removed | Changed |
---|---|---|---|---|
data/complete_dataset.pbtxt.gz | 0 | 0 | 0 | |
0 | 0 | 0 |
@skearnes @connorcoley @qai222 The Pfizer 39k dataset is ready for review.
Thanks @bdeadman!
Change summary: | Filename | Added | Removed | Changed |
---|---|---|---|---|
data/d9/ord_dataset-d92976309c3a48a3a64a4cf5e7048086.pb.gz | 39347 | 0 | 0 | |
39347 | 0 | 0 |
Change summary: | Filename | Added | Removed | Changed |
---|---|---|---|---|
data/d9/ord_dataset-d92976309c3a48a3a64a4cf5e7048086.pb.gz | 39347 | 0 | 0 | |
39347 | 0 | 0 |
@bdeadman I just realized that the dataset name and description are empty; can you submit a PR to update them?
39k reaction dataset from https://doi.org/10.1038/s41557-023-01393-w. This is a Pfizer dataset which was previously proprietary but was published earlier in 2024. This dataset includes additional labelling of solvents and reagents which was not provided in the Nature paper.
Original dataset preparation by @emmaking-smith. @bdeadman has extracted names from solvents, reagent1 and reagent2 fields, and where possible has split mixtures into their components, and added smiles strings. data and generator notebook.zip