patrickrchao / JailbreakingLLMs

https://jailbreaking-llms.github.io/
MIT License
423 stars 62 forks source link

Issue with Judge system prompt, GPT4 refusing to output ratings #17

Open NamburiSrinath opened 1 week ago

NamburiSrinath commented 1 week ago

Hi @patrickrchao and @eltociear,

Wonderful repo, thanks a lot!

I am wondering if the Judge System prompt for GPT is actually correct i.e Section E in the paper and/or code - https://github.com/patrickrchao/JailbreakingLLMs/blob/main/system_prompts.py#L50

The judge should have the goal/objective and response to do the rating. But am I missing something here?

P.S: I changed the prompt a bit but GPT4 is refusing to provide ratings. I marked this issue in JailBreakBench as well (https://github.com/JailbreakBench/jailbreakbench/issues/34)