New huggingface eval for the summarization use case with rouge, meteor, and bertscore

What's changing

Added a new evaluate huggingface entrypoint which supports evaluation of local and remote models (seq2seq, causal, openai, vllm, llamafile) and loading datasets / saving results on s3.

How to test it

lm-buddy evaluate huggingface --config examples/configs/evaluation/hf_evaluate_config.yaml

Related Jira Ticket

https://mzai.atlassian.net/browse/MZPLATFORM-78

Additional notes for reviewers

I know we discussed messaging to mzai-platform's backend directly from lm-buddy jobs. I am 100% in favor of it, I just wanted to keep this PR independent from messaging and I will create a new one to be tested together with the updated mzai-platform code.

mozilla-ai / lm-buddy

New huggingface eval for the summarization use case with rouge, meteor, and bertscore #100

What's changing

How to test it

Related Jira Ticket

Additional notes for reviewers