lancopku / agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"
23 stars 1 forks source link

The code for fine-tuning #2

Open Shinichi618 opened 1 month ago

Shinichi618 commented 1 month ago

Hello, could you please open-source the code for fine-tuning the agent on the mixed dataset?

keven980716 commented 1 month ago

Hi, thank you for your interest! Sorry for the delay of open-sourcing the code, because we have been dealing with some personal issues since February. We will release the code part in about 1~2 weeks. Hope you could understand.

However, we will not release the fine-tuning code because the fine-tuning is directly based on the original AgentTuning and ToolBench. Users can follow the same procedure in their instructions while replacing the target dataset with our poisoned dataset to perform agent attacks. But we will release the code for generating the poisoned training traces, building WebShop environment for inference, the corresponding common lines and other files that are not included in the original AgentTuning and ToolBench.

Thanks for your understanding~ After releasing the code, if you have any trouble in running the experiments, welcome to open further issues~

Shinichi618 commented 1 month ago

Thank you for your response! I previously saw that you mentioned the fine-tuning was based on AgentTuning. However, I could not find the fine-tuning code in their GitHub repository. It seems that they have only open-sourced the dataset and evaluation code.

keven980716 commented 1 month ago

Sorry for the misleading. The fine-tuning is based on FastChat. We have just realized that AgentInstruct did not explicitly mention this. ToolBench mentioned this in its repo.

Shinichi618 commented 1 month ago

Thanks!!! Another question concerns the dataset and the base model.

  1. My understanding is that the dataset includes the AgentInstruct dataset (1866 samples) as well as the poisoned samples you created (50 samples). Does it also include the ShareGPT dataset?
  2. Is the base model LLaMA2-7BChat, or is it LLaMA2-7BChat fine-tuned on the AgentInstruct and ShareGPT datasets (the same as AgentTuning)? My understanding is the former one.
keven980716 commented 1 month ago

(1) " Does it also include the ShareGPT dataset?" -> No, we do not include ShareGPT dataset in our experiments. Including ShareGPT data in the original AgentTuning is just to maintain the general ability of the LLM, which is not related to the agent ability and our attacking objective.

(2) "Is the base model LLaMA2-7BChat, or is it LLaMA2-7BChat fine-tuned on the AgentInstruct and ShareGPT datasets" -> The base model is the original LLaMA2-7B-Chat in our experiments.

As your concern is whether using the ShareGPT data, my understanding is: if you want to maintain the general ability of the LLM after fine-tuning, you can definitely include ShareGPT data in the fine-tuning; if you only want to create a LLM-based agent, it is fine to abandon the general data part.

dongdongzhaoUP commented 1 week ago

@Shinichi618 Hi, have you reproduced the fine-tuning/evaluate code?