huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
471 stars 55 forks source link

Add AGIEval #79

Closed lewtun closed 3 months ago

lewtun commented 4 months ago

AGIEval is a popular set of benchmarks that was popularised by Teknium/Nous in models like OpenHermes. It would be nice to include in lighteval so we can compare internally how our models stack up on this benchmark :)

Ref command from AutoEval:

    benchmark="agieval"
    python main.py \
        --model hf-causal \
        --model_args pretrained=$MODEL_ID,trust_remote_code=$TRUST_REMOTE_CODE \
        --tasks agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math \
        --device cuda:$cuda_devices \
        --batch_size auto \
        --output_path ./${benchmark}.json
clefourrier commented 4 months ago

Would you need AGIEval or BBH first?

lewtun commented 4 months ago

Would you need AGIEval or BBH first?

Maybe we can do BBH first since you already have made a big dent in it in #7 ?