huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
851 stars 100 forks source link

Adding chat completion task to endpoint models #281

Open sadra-barikbin opened 3 months ago

sadra-barikbin commented 3 months ago

Hi there!

This PR attempts to address the need for evaluating endpoint models on chat completion tasks, i.e. using chat templating. BaseModel and NanotronModel supported it through FewshotManager.fewshot_context() which applies chat template to the fewshot & query examples. For endpoint models we could either use the very InferenceClient.text_generation() or the native IneferenceClient.chat_completion() apis. This PR attempts to use the latter.

Generally, could be fruitful if Lighteval makes use of huggingface_hub types extensively? At least for GenerativeResponse's result attribute to be of type ChatcompletionOutput|TextGenerationOutput and metrics work with inputs of these types as well so that we could evaluate function calling and tools easily. Or for GreedyUntilRequest's context attribute to be of type Conversation : TypeAlias = List[ChatCompletionInputMessage] to be able to feed tools params.