Closed dovinmu closed 1 year ago
I ended up writing a few more ops for this PR, some of which seem obvious (!csv-parse) and some of which I expect will change (the !metrics- ones). I'm especially interested in getting the changes to llm.py merged so we have a way of calling LLMs from different sources
(closing #28 in favor of this)
At minimum before we merge this branch:
for the binary classification script:
these are nice to haves, basically they'd make it so we were using the bigbench tasks more as intended:
If #27 is resolved then we could compute multiple accuracy metrics. And we could also optionally explore having another script that runs the entire slate of tasks on a set of models.