open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
236 stars 60 forks source link

1440 suzuki rxns #196

Closed bdeadman closed 1 month ago

bdeadman commented 3 months ago

1440 suzuki rxns from https://doi.org/10.1038/s44160-023-00351-1.

bdeadman commented 3 months ago

@cernak-lab This is the 1440 rxn Suzuki dataset from your Miniaturization of Reactions paper (https://doi.org/10.1038/s44160-023-00351-1.

suzuki for enumeration.zip contains the .pbtxt template and the suzuki sheet of rxns in csv format.

I'm working on the reductive amination and alkylation datasets in the same paper.

bdeadman commented 1 month ago

This PR now has all 3 datasets [Suzuki, reductive amination, N-alkylation with BOC deprotection]. The data and template files are in submission.zip.

Notes:

bdeadman commented 1 month ago

ValueError: error parsing data/alkylation_merge.pbtxt: 2:331 : Message type "ord.Dataset" has no field named "BOC".

Looks like I've accidentally broken them template file at a "BOC" string in the alkylation dataset. I'll find the error and update the file(s).

bdeadman commented 1 month ago

Fixed the "BOC" string problem (it was a rogue " in the dataset description). Now having issues with rdkit validation of a product SMILES (Core_SMILES). It is odd since it was validating when I enumerated the datasets.

bdeadman commented 1 month ago

The Core_SMILES not validating was due to the tag on the SMILES identifier not having the $$ labels. This is corrected in the attached template file, and in the GitHub PR.

alkylation_template.zip