Add piqa dataset benchmark

gururise / AlpacaDataCleaned

Alpaca dataset from Stanford, cleaned and curated

Apache License 2.0

1.46k stars 146 forks source link

Closed gururise closed 1 year ago

gururise commented 1 year ago

First attempt to add the PIQA dataset (validation split) to benchmark suite.

gururise commented 1 year ago

Don't know if I have an optimal prompt, but here's the initial results:

dataset	model	Squad "Mini" (f1)	Piqa (acc)
Original Alpaca	samwit/alpaca7B-lora	34.63	50.5
Cleaned Alpaca (Mar 27)	tloen/alpaca-lora-7b	49.64	54.0

HideLord commented 1 year ago

Isn't the random performance around 50 since there are only two answers?

gururise commented 1 year ago

Isn't the random performance around 50 since there are only two answers?

Yes. Random is 50%. Leaderboard here.

Seems "Majority Class" Performance on this benchmark is 50.4%