Question about the tokenizer's pad_token when using llama2 as target model

tml-epfl / llm-adaptive-attacks

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]

https://arxiv.org/abs/2404.02151

MIT License

224 stars 23 forks source link

Question about the tokenizer's pad_token when using llama2 as target model #5

Closed Kris-Lcq closed 4 months ago

Kris-Lcq commented 4 months ago

Hello authors, you guys have done an amazing work in the AI safety area. I hava a very simple question about tokenizer's pad_token when using llama2 as target model. The below code shows when using llama2 as target model, the tokenizer's pad_token should be tokenizer.unk_token. 屏幕截图 2024-07-15 103651 But when i run the main.py, the tokenizer's pad_token is tokenizer.eos_token rather than tokenizer.unk_token because like the below picture shows, in the config.py, the path of Llama-2-7b-chat-hf do not contain " llama2" instead of "Llama-2". Is this an error? 屏幕截图 2024-07-15 103756

max-andr commented 4 months ago

Hi,

I think it shouldn't matter at all for our codebase since we've been always using batch size equal to one (thus, no padding is needed for shorter sequences in a batch).

The reason why that if condition is present is because our repository is based on https://github.com/patrickrchao/JailbreakingLLMs which does use batching, and most likely I just forgot to remove this inactive if statement.

Kris-Lcq commented 4 months ago

Thank you for time, i will close this issue.