open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
233 stars 59 forks source link

Add USPTO-50K dataset from https://doi.org/10.1021/ci5006614 #60

Closed skearnes closed 3 years ago

skearnes commented 3 years ago

I had to create dummy inputs/outputs to get the validations to pass. I assume we're more concerned about easy access than avoiding duplicates for these cases where we only include the reaction SMILES.

I've attached the notebook I used to replicate the train/test split.

uspto-50k.zip

skearnes commented 3 years ago

Oops, the data has a CC-BY-NC license; we'll need to ask them for permission to include.