sustainable-processes / ORDerly

Chemical reaction data & benchmarks. Extraction and cleaning of data from Open Reaction Database (ORD)
MIT License
67 stars 8 forks source link

Add args for prediction type #2

Open dswigh opened 1 year ago

dswigh commented 1 year ago

Add 2 new args:

1) prediction_type (or something like that): e.g. yield prediction, only_mapped_reaction, condition_prediction

2) Data_set: only_uspto, all_available

dswigh commented 1 year ago
  1. Instead of having a 'prediction type', let's create two flat file benchmarks, both just extracting USPTO data, but one with default settings that removes/handles reactions with uncommon molecules, and another with all the arg settings set to 0.
  2. This has been implemented!
dswigh commented 1 year ago
  1. When creating flat files for benchmarking, we should creat train/val/test splits (80/10/10), splitting the data in 3 different ways: random, temporal (by grant date), and rxn class (both by super class (very hard) and by sub-classes (medium difficulty)).