sherdencooper / GPTFuzz

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
MIT License
371 stars 46 forks source link

Experimentation with alpha and beta #32

Open romanlutz opened 3 months ago

romanlutz commented 3 months ago

Out of curiosity, did you try experimenting with alpha and beta before arriving at the values used in this repo? It doesn't seem to be mentioned in the paper.

CC @gseetha04

Thank you!

sherdencooper commented 3 months ago

Hi, thanks for your careful reading. We just did some search to compare different alpha and beta on one model (Llama-2). We did not have a good solution to optimize the alpha and beta on different models, so we treated them as hyperparameters and did some tests on Llama-2 to pick one for the whole experiment. We did some ablation study on them and found that in a reasonable range, they seem to have a small impact on the performance. Of course, I believe that different alpha/beta for different models or dynamically changing them could be superior solutions. Feel free to discuss with me about your thoughts!

romanlutz commented 3 months ago

Makes perfect sense! We're integrating GPTFuzz into PyRIT (see Azure/PyRIT) and were wondering whether to keep them constant or configurable. This sounds like a case for configurable with these values as defaults.

CC @gseetha04