yields

我们的dataset包含Buchwald-Hartwig (BH) 以及 Suzuki-Miyaura (SM) 我们的splitting方式：follow ChemLLMBench，从BH和SM中各random select 100 samples, use regression as task-objective (not binary classification), 并且normalize yield value to [0,1]. BH: train 3,955; test: 100 SM: train 5,760; test: 100

Baselines:
[x] UAGNN: https://github.com/seokhokang/reaction_yield_nn, 需要在我们的test-set上做evaluation(ChemLLMBench中的结果是binay-classification的结果)
[x] T5Chem (原文仅包含BH), 需要在我们的test-set上做evaluation

Foward & Retro

Baselines:
[x] Chemformer https://github.com/MolecularAI/Chemformer
- Forward Prediction
- Retro Prediction
[ ] T5Chem (Scaffold-split需要做evaluation)
[x] Text+ChemT5 (Mol-split直接从InstrutcMol抄) (Scaffold-split需要做evaluation)
[x] MolecularTransformer (forward: 直接从InstrutcMol抄) (Scaffold-split需要做evaluation)
[x] Retroformer-untyped (retro: 直接从InstrutcMol抄) (Scaffold-split需要做evaluation)

Reaction classification

USPTO_TPL_1K

Baselines:
[x] rxnfp: https://github.com/rxn4chemistry/rxnfp (直接抄下图结果)

T5Chem (原文仅在top500 classes上做训练测试，不具备可比较性）

Reagent Selection

Follow ChemLLMBench, and formulate reaction component selection tasks from the Suzuki High-Throughput Experimentation (HTE) dataset. Evaluate the Suzuki coupling of 5 electrophiles and 7 nucleophiles across a matrix of 11 ligands (with one blank), 7 bases (with one blank), and 4 solvents. Three components:

reactant selection (top-1 acc)
solvent selection (top-1 acc)
ligand selection (top-50% acc)
Baselines:
[x] ChemLLMBench提到的诸多off-the-shelf LLMs
[x] ChemDFM-13B

Reaction Component Prediction

Data from TextReact and Mol-Instruction Reagent Prediction. 因为TextReact整体数据量过大，为了平衡我们使用MolIns-Reagent-Prediction Part dataset, 根据TextReact提供的关于Reaction Condition的信息将原数据集拆分为reagent prediction, catalyst prediction和solvent prediction

Given canonical_rxn. Reaction Condition include: catalyst1, solvent1, solvent2, reagent1, reagent2

Baselines
[ ] TextReact hf-checkpoint 最好做一下evaluation

open-mol / bioagent

Add Specialist Performance #14

yields

Foward & Retro

Reaction classification

Reagent Selection

Reaction Component Prediction