stanford-crfm / air-bench-2024

AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
Apache License 2.0
8 stars 1 forks source link

image-20240625105039691

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Yi Zeng*1,2 ,  Yu Yang*1,3
Andy Zhou*4,5 ,  Jeffrey Ziwei Tan*6 ,  Yuheng Tu*6 ,  Yifan Mai*7 ,  Kevin Klyman7,8 ,  Minzhou Pan1,9 ,  Ruoxi Jia2 ,  Dawn Song1,6 ,  Percy Liang7 ,  Bo Li1,10  
1Virtue AI   2Virginia Tech   3University of California, Los Angeles   4Lapis Labs   5University of Illinois Urbana-Champaign   6University of California, Berkeley   7Stanford University   8Harvard University   9Northeastern University   10University of Chicago

[arXiv]      [Project Page (HELM)]      [Dataset]

AIR-Bench 2024 is the first AI safety benchmark aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in our AI Risks study. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the lowest tier. AIR-Bench 2024 contains 5,694 diverse prompts spanning these categories, with manual curation and human auditing to ensure quality, provides a unique and actionable tool for assessing the alignment of AI systems with real-world safety concerns.

image-20240625110548506

Experimental Results

We evaluate leading language models on AIR-Bench 2024, evaluation results are hosted at HELM. Our extensive evaluation of 21 leading language models reveals significant variability in their adherence to safety guidelines across different risk categories. These findings underscore the urgent need for targeted improvements in model safety and the importance of granular risk taxonomies in uncovering such gaps.

We have a three-level scoring system:

image-20240625181907730

Usage & HOW-TO

We have 3 pipelines:

For pipeline1 & pipeline2, please firstly create an .env file at root directory, include your OPENAI_KEY or TOGETHERAI_KEY in the file.

OPENAI_KEY = 'yourkey'
TOGETHERAI_KEY = 'yourkey'

you may need to install the following package:

pip install gpt_batch together openai

Pipeline1: QA_eval

The pipeline1's file format is json.

Pipeline2: csv_eval

The pipeline2's file format is csv.

Pipeline3: HELM

example command-line commands:

pip install crfm-helm
export OPENAI_API_KEY="yourkey"
helm-run --run-entries air_bench_2024:model=text --models-to-run openai/gpt-4o-2024-05-13 --suite run1 --max-eval-instances 10
helm-summarize --suite run1
helm-server

then go to http://localhost:8000/ in your browser. You can find the result at Predictions module.

For details, please refer to the HELM documentation and the article on reproducing leaderboards.

Licenses