simulated-trial-and-error/README.md at main · microsoft/simulated-trial-and-error
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's `imagination' to simulate plausible scenarios for using a tool, after which the LLM interacts with the tool to learn from its execution feedback. Both short-term and long-term memory are employed to improve the depth and breadth of the exploration, respectively. Comprehensive experiments on ToolBench show that STE substantially improves tool learning for LLMs under both in-context learning and fine-tuning settings, bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform GPT-4. We also show effective continual learning of tools via a simple experience replay strategy.
File Structure
STE/
├─ tool_metadata/: tool related metadata
├─ prompts/: full prompts used
├─ saved_results/: prediction results in json
├─ {*, *_FT, *_ICL}.json: results for baseline model, tool-enhanced w/ fine-tuning, tool-enhanced with ICL
├─ CL_round_*.json: continual learning (each round)
├─ main.py: main script for STE
├─ postprocessing.py: filtering & paraphrasing for tool enhancement
├─ evaluation.ipynb: evaluation script and cached evaluation results
├─ my_llm.py: helper functions for LLM API call
└── utils.py: other helper functions
llama-recipes/ (adapted from https://github.com/facebookresearch/llama-recipes/)
├─ configs/: configurations for model training
├─ training.py: model training-related arguments
├─ ...
├─ ft_datasets/: cached data files for fine-tuning and testing
├─ api2neighbors.json: nearest neighbors for each API (based on API description similarity)
├─ flan_v2_2k.json: 2k random examples from flan_v2
├─ tool_data_train_STE_*.json: distilled tool-specific training data
├─ tool_test*.json: test set (w/ retrieved demonstration examples)
├─ ...
├─ inference/: helper functions for model inference
├─ sysmsg_dir/: system messages for tool and non-tool mode
├─ jobs/: example bash scripts for training/inference
├─ llama_finetuning.py: scripts for model training
├─ data_proc_format.py: data formatting/merging for model training
└── demo_retrieve.ipynb: nearest-neighbor demonstration retrieval
Environment Setup
Put your OpenAI API key in api_key.txt in the parent directory.
For STE/, install ToolBench, BMTools and acquire the associated API keys following their respective instructions, and then
For STE with custom APIs, simply append the API names and descriptions to API_list.json and API_descriptions.json in tool_metadata/, and change the run_tool function in main.py to enable the execution of newly-added tools.
STE/evaluation.ipynb includes the evaluation scripts and cached evaluation results for all predictions files in STE/saved_results/
Citation
@misc{wang2024llm,
title={LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error},
author={Boshi Wang and Hao Fang and Jason Eisner and Benjamin Van Durme and Yu Su},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Suggested labels
{'label-name': 'tool-learning', 'label-description': 'Focuses on learning through simulated trial and error using tools for large language models.', 'confidence': 75.1}
simulated-trial-and-error/README.md at main · microsoft/simulated-trial-and-error
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
File Structure
Environment Setup
Put your OpenAI API key in
api_key.txt
in the parent directory.STE/
, install ToolBench, BMTools and acquire the associated API keys following their respective instructions, and thenllama-recipes/
, set up the environment following https://github.com/facebookresearch/llama-recipes.Exploration w/ STE
Custom tool
For STE with custom APIs, simply append the API names and descriptions to
API_list.json
andAPI_descriptions.json
intool_metadata/
, and change therun_tool
function inmain.py
to enable the execution of newly-added tools.Exploitation w/ STE
Data preparation
Fine-tuning & Inference
ICL
First run
demo_retrieve.ipynb
to prepare retrieved demonstration examples.Continual Learning with Rehearsal
For round {0|1|2|3},
Evaluation
STE/evaluation.ipynb
includes the evaluation scripts and cached evaluation results for all predictions files inSTE/saved_results/
Citation
Suggested labels
{'label-name': 'tool-learning', 'label-description': 'Focuses on learning through simulated trial and error using tools for large language models.', 'confidence': 75.1}