Closed bdeadman closed 1 month ago
@cernak-lab This is the 1440 rxn Suzuki dataset from your Miniaturization of Reactions paper (https://doi.org/10.1038/s44160-023-00351-1.
suzuki for enumeration.zip contains the .pbtxt template and the suzuki sheet of rxns in csv format.
I'm working on the reductive amination and alkylation datasets in the same paper.
This PR now has all 3 datasets [Suzuki, reductive amination, N-alkylation with BOC deprotection]. The data and template files are in submission.zip.
Notes:
ValueError: error parsing data/alkylation_merge.pbtxt: 2:331 : Message type "ord.Dataset" has no field named "BOC".
Looks like I've accidentally broken them template file at a "BOC" string in the alkylation dataset. I'll find the error and update the file(s).
Fixed the "BOC" string problem (it was a rogue " in the dataset description). Now having issues with rdkit validation of a product SMILES (Core_SMILES). It is odd since it was validating when I enumerated the datasets.
The Core_SMILES not validating was due to the tag on the SMILES identifier not having the $$ labels. This is corrected in the attached template file, and in the GitHub PR.
1440 suzuki rxns from https://doi.org/10.1038/s44160-023-00351-1.