Added a new evaluate huggingface entrypoint which supports evaluation of local and remote models (seq2seq, causal, openai, vllm, llamafile) and loading datasets / saving results on s3.
I know we discussed messaging to mzai-platform's backend directly from lm-buddy jobs.
I am 100% in favor of it, I just wanted to keep this PR independent from messaging and I will create a new one to be tested together with the updated mzai-platform code.
What's changing
Added a new
evaluate huggingface
entrypoint which supports evaluation of local and remote models (seq2seq, causal, openai, vllm, llamafile) and loading datasets / saving results on s3.How to test it
Related Jira Ticket
https://mzai.atlassian.net/browse/MZPLATFORM-78
Additional notes for reviewers
I know we discussed messaging to mzai-platform's backend directly from lm-buddy jobs. I am 100% in favor of it, I just wanted to keep this PR independent from messaging and I will create a new one to be tested together with the updated mzai-platform code.