How to split train and test datasets in ToolBench? (Thought attack)

lancopku / agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"

25 stars 1 forks source link

How to split train and test datasets in ToolBench? (Thought attack) #5

Closed Zhang-Henry closed 2 months ago

Zhang-Henry commented 2 months ago

The data file 'data_reproduce' provided in Tool Learning did not provide the train and test JSON files used in ToolBench, i.e., 'data/toolllama_G123_dfs_train.json', 'data/toolllama_G123_dfs_eval.json'. How to split the two files for training and testing using the preprocessing script such as 'preprocess_toolllama_data.py' in the original repo? Thanks a lot!

keven980716 commented 2 months ago

ToolBench has already provided the detailed instructions for data pre-processing. For example, the answer/G1_answer_poison100 is the input dir to the script 'preprocess_toolllama_data.py' to generate its output file data/answer/toolllama_G1_dfs_poison100.json. Please read the instructions in ToolBench carefully to do the data pre-processing.

'data/toolllama_G123_dfs_eval.json' is not used in experiments, which can be empty.