Add BHASA LINDSEA scenarios

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).

https://crfm.stanford.edu/helm

Apache License 2.0

1.77k stars 235 forks source link

Add BHASA LINDSEA scenarios #2694

Closed raileymontalan closed 2 weeks ago

raileymontalan commented 1 month ago

Add the following for BHASA:

[x] Linguistic Diagnostics: syntax minimal pairs, pragmatic reasoning

raileymontalan commented 4 weeks ago

Hi @yifanmai, apologies for the mixup. We've decided to keep only the LINDSEA-related code here. All other code has been moved to this PR. I will be compiling and addressing your comments there.

yifanmai commented 3 weeks ago

Please run the linter: at the root of the repo, run:

pip install -e '.[dev]'
./pre-commit.sh

raileymontalan commented 2 weeks ago

Hi @yifanmai, I saw that a lot of PR checks (including this one) were failing with the same error message (Failed to build backports.zoneinfo). Please advise.

yifanmai commented 2 weeks ago

@raileymontalan sorry, the main branch was broken. Could you merge main again? That should pick up the fix.

raileymontalan commented 2 weeks ago

@yifanmai Thanks! This PR has now passed all the checks :)

weiqipedia commented 2 weeks ago

As a reminder, there is still a need to discuss and revamp the calculation of metrics for the LINDSEA scenarios before this can be merged!

yifanmai commented 2 weeks ago

I'll merge this first so that I can do some prototyping on top of this.