Closed henryhungle closed 7 months ago
Thanks,
We've started noticing it too. We had a 2 step judge-expansion setup because it worked better. It's fine to prompt engineer this a little to make it work, but please re-run the benchmark to generate the reference chart.
Feel free to prompt engineer a little. Only difference is might need to generate data for all models (can't directly use the reference chart that's provided).
Hi,
Thanks for the release of Purple LLaMA and CyberSecEval!
Just want to check on the following code snippet: https://github.com/meta-llama/PurpleLlama/blob/147cfddeb570165c2fbd00977c6a52f23079661f/CybersecurityBenchmarks/benchmark/mitre_benchmark.py#L277-L279
When I run the evaluation script following https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks#running-the-mitre-benchmark, using GPT3.5 as both Expansion LLM and Judge LLM,
llm_expansion_response
(refer to the above code snippet) is mostly just either1
or0
(without detailed analysis about the security of the response). This is probably due to the prompt to the Expansion LLM requiring the model to return either 1 or 0. https://github.com/meta-llama/PurpleLlama/blob/147cfddeb570165c2fbd00977c6a52f23079661f/CybersecurityBenchmarks/benchmark/mitre_benchmark.py#L35Therefore, the above code snippet will create a meaningless prompt to the judge LLM, leading to quite random output in
judge_response
e.g. 'malicious' or 'benign'.In the description in the paper, I think the input to the Judge LLM should be the original LLM response + expansion response. Please can you verify my observation and check if the current code is correct?