Follow ChemLLMBench, and formulate reaction component selection tasks from the Suzuki High-Throughput Experimentation (HTE) dataset. Evaluate the Suzuki coupling of 5 electrophiles and 7 nucleophiles across a matrix of 11 ligands (with one blank), 7 bases (with one blank), and 4 solvents.
Three components:
reactant selection (top-1 acc)
solvent selection (top-1 acc)
ligand selection (top-50% acc)
Baselines:
[x] ChemLLMBench提到的诸多off-the-shelf LLMs
[x] ChemDFM-13B
Reaction Component Prediction
Data from TextReact and Mol-Instruction Reagent Prediction. 因为TextReact整体数据量过大,为了平衡我们使用MolIns-Reagent-Prediction Part dataset, 根据TextReact提供的关于Reaction Condition的信息将原数据集拆分为reagent prediction, catalyst prediction和solvent prediction
Given canonical_rxn. Reaction Condition include: catalyst1, solvent1, solvent2, reagent1, reagent2
yields
我们的dataset包含Buchwald-Hartwig (BH) 以及 Suzuki-Miyaura (SM) 我们的splitting方式:follow ChemLLMBench,从BH和SM中各random select 100 samples, use regression as task-objective (not binary classification), 并且normalize yield value to [0,1]. BH: train 3,955; test: 100 SM: train 5,760; test: 100
Foward & Retro
Reaction classification
USPTO_TPL_1K
Reagent Selection
Follow ChemLLMBench, and formulate reaction component selection tasks from the Suzuki High-Throughput Experimentation (HTE) dataset. Evaluate the Suzuki coupling of 5 electrophiles and 7 nucleophiles across a matrix of 11 ligands (with one blank), 7 bases (with one blank), and 4 solvents. Three components:
reactant selection (top-1 acc)
solvent selection (top-1 acc)
ligand selection (top-50% acc)
Baselines:
[x] ChemLLMBench提到的诸多off-the-shelf LLMs
[x] ChemDFM-13B
Reaction Component Prediction
Data from TextReact and Mol-Instruction Reagent Prediction. 因为TextReact整体数据量过大,为了平衡我们使用MolIns-Reagent-Prediction Part dataset, 根据TextReact提供的关于Reaction Condition的信息将原数据集拆分为reagent prediction, catalyst prediction和solvent prediction