gururise / AlpacaDataCleaned

Alpaca dataset from Stanford, cleaned and curated
Apache License 2.0
1.46k stars 146 forks source link

Add piqa dataset benchmark #51

Closed gururise closed 1 year ago

gururise commented 1 year ago

First attempt to add the PIQA dataset (validation split) to benchmark suite.

gururise commented 1 year ago

Don't know if I have an optimal prompt, but here's the initial results:

dataset model Squad "Mini" (f1) Piqa (acc)
Original Alpaca samwit/alpaca7B-lora 34.63 50.5
Cleaned Alpaca (Mar 27) tloen/alpaca-lora-7b 49.64 54.0
HideLord commented 1 year ago

Isn't the random performance around 50 since there are only two answers?

gururise commented 1 year ago

Isn't the random performance around 50 since there are only two answers?

Yes. Random is 50%. Leaderboard here.

Seems "Majority Class" Performance on this benchmark is 50.4%