Safety evals, probably need more comprehensive and generic testing!

Bhardwaj-Rishabh commented 4 months ago

Great work!

Regarding safety evals using inthewild_jailbreak_llms, not sure which roleplay dataset is this but it seems to be dedicated to jailbreak ChatGPT. Additionally, curious if there are any reasons to use this (not seem so popular) dataset for safety evaluations?

Location of the file: https://github.com/raga-ai-hub/raga-llm-hub/blob/main/src/raga_llm_hub/utils/data_files/inthewild_jailbreak_llms.txt

While it may not be very harmful to overlook certain aspects in other assessments, safety evaluations need to be very comprehensive and cover a range of test cases, at least matching the current open-source benchmark standards.

kiran-raga commented 3 months ago

Thanks for your feedback and for highlighting the importance of comprehensive safety evaluations. The choice of the inthewild_jailbreak_llms dataset was based on some popular packages and papers, aiming to explore diverse scenarios that might not be covered by more popular datasets. However, we recognize the need for a broad range of test cases to meet and exceed current open-source benchmark standards.

Improving test quality and coverage is a priority for our next update. We're actively exploring additional datasets and methodologies to enhance our safety evaluations. If you have any suggestions or resources you believe could contribute to this goal, we'd love to hear from you. Your input is invaluable as we strive to make our tool safer and more reliable for everyone. Stay tuned for updates, and thanks again for your constructive feedback.

Bhardwaj-Rishabh commented 3 months ago

Yes certainly, we (as a research lab) are working in this direction too. Please feel free to reach out to me at rishabhbhardwaj15@gmail.com I can suggest some (open-source) datasets that we and others have built.

raga-ai-hub / raga-llm-hub

Safety evals, probably need more comprehensive and generic testing! #1