huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
https://huggingface.co/docs/evaluate
Apache License 2.0
2.04k stars 259 forks source link

Evaluate LLM models like llama/alpaca using evaluate library? #433

Open Jeffwan opened 1 year ago

Jeffwan commented 1 year ago

Hi team, thanks for open source this awesome tool. I am new to the tool and try to ask some questions on LLM evaluation

  1. Seems evaluate already create some evaluators (Some libs call it tasks I think). Can we use these evaluator for LLM evaluation?
  2. I feel different tasks required different datasets. for LLM evaluation, there're popular datasets like MMLU. I am trying to ask Is there tested paring? for example, for QA, I can use dataset1, dataset2 for metric1, metric2 evaluation etc
  3. What's the difference between huggingface/evaluate and https://github.com/EleutherAI/lm-evaluation-harness?
plmsmile commented 1 year ago

m

philwee commented 1 year ago

same, will it get supported?

currently getting this error: raise ValueError( ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.