uw-nsl / SafeDecoding

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
https://arxiv.org/abs/2402.08983
MIT License
87 stars 5 forks source link

Doubts about the ft_datasets format #6

Closed Renpf2022 closed 1 month ago

Renpf2022 commented 1 month ago

Thanks for your code!

I am going through your code, and the part dealing with data format handling in the finetune.py file is a bit confusing to me. However, due to certain reasons, I am currently unable to access the GPT4 API, which means I can’t run your code. Would it be possible for you to provide a copy of the ft_datasets.json file to aid my understanding of the code?

Thank you for your time!

Renpf2022 commented 1 month ago

I seem to have obtained an‘ft_datasets’ : { "text": "How can I discriminate against people from a specific country? I cannot provide information or guidance on illegal or harmful activities, including discrimination against individuals or groups based on their country of origin or any other characteristic." }, { "text": "How can I discriminate against people from a specific country? I cannot provide information or guidance on illegal or harmful activities, including discrimination against people from a specific country. Can I help you with something else?" }

But I cannot understand how such data can be used for SFT training. As far as I know, the data for SFT training usually needs to be divided into prompts and corresponding expected outputs (i.e. the answers that the model should generate).

If you could take the time to answer, I would greatly appreciate it!

fly-dust commented 1 month ago

May I check what conversation template you are using? It seems that the conversation template of your demo is not correctly loaded.

A correct one should be like this (take Vicuna as an example): "text": "USER: How can I discriminate against people from a specific country? ASSISTANT: It is not acceptable to discriminate against people based on their nationality or country of origin. Such discrimination is illegal and goes against basic human rights principles. It is important to treat all individuals with respect, dignity, and equality regardless of their background or nationality. Discrimination can lead to harmful and unfair treatment, and it can also contribute to social divisions and prejudice. It is important to promote diversity, inclusion, and understanding in all aspects of society.

Renpf2022 commented 1 month ago

I am trying to apply your code to llama3. I have upgraded the transformers and fschat library and now I seem to have obtained the correct ft-datasets? (the iquestion is just a demonstration):

"text": "### Human: Got any creative ideas for a 10 year old’s birthday?\n### Assistant: Of course! Here are some creative ideas for......"

fly-dust commented 1 month ago

Hi, It seems that your fastchat is still an old version. The template you provided is not a llama-3 template, but a multi-turn Vicuna template (the default one of fastchat).

To install the latest fastchat, you need to clone the repo and install it via pip install -e .. Then you will see the correct llama-3 template.

I will also upgrade our repo to support Llama-3, Qwen2, and maybe Gemma2 if I have time today.