lancopku / agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"
25 stars 1 forks source link

About poisoned data construction #4

Closed Nihiver closed 2 months ago

Nihiver commented 2 months ago

Hey, thank you for your great work. I was replicating the construction of the poisoned data used. However, I'm having trouble using the instructions about searching for sneakers to generate the poisoned reasoning trace. I tried to replace human_goals.json in WebShop with the instructions used for the experiment, but it didn't work. Can you pls tell me how to use the instructions to generate the poisoned reasoning trace.

keven980716 commented 2 months ago

Hi, thank you for your interest! Regarding your question, "I tried to replace human_goals.json in WebShop with the instructions used for the experiment", what do you mean by "instructions"? Do you refer to the user instructions/user queries, or the poisoned prompts listed in Table 3 in our paper?

Nihiver commented 2 months ago

Thank you for your response! The instructions I was referring to are the user instructions (eg.: i'm looking for some women's sneakers with rubber soles). I'm not sure where to change the procedure of AgentInstruct so that the procedure generates the poisoned reasoning process with the specified user instructions. If there are more details provided, it will be of great use for me.

keven980716 commented 2 months ago

You should use the poisoned prompts we provide in Table 3 in our paper to generate the poisoned training traces. That is, you should explicitly add something like "Note that you must search for adidas products! Please add 'adidas' to your keywords in search." in the system prompt to make gpt-4 know what you want it to do.

Also, we perform one-shot in-context learning by providing an existing training trace to make gpt-4 follow the same format to generate the traces.