Add AGIEval - Githubissues

lewtun commented 4 months ago

AGIEval is a popular set of benchmarks that was popularised by Teknium/Nous in models like OpenHermes. It would be nice to include in lighteval so we can compare internally how our models stack up on this benchmark :)

Ref paper: https://arxiv.org/abs/2304.06364
Ref code: https://github.com/dmahan93/lm-evaluation-harness/tree/add-agieval

Ref command from AutoEval:

    benchmark="agieval"
    python main.py \
        --model hf-causal \
        --model_args pretrained=$MODEL_ID,trust_remote_code=$TRUST_REMOTE_CODE \
        --tasks agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math \
        --device cuda:$cuda_devices \
        --batch_size auto \
        --output_path ./${benchmark}.json

clefourrier commented 4 months ago

Would you need AGIEval or BBH first?

lewtun commented 4 months ago

Would you need AGIEval or BBH first?

Maybe we can do BBH first since you already have made a big dent in it in #7 ?

huggingface / lighteval

Add AGIEval #79