Serbian LLM Benchmark Task

DeanChugall commented 1 month ago

Serbian LLM Benchmark Task Configuration and Prompt Functions

Summary:

This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:

ARC (easy and challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, custom OZ Eval dataset.

The tasks are defined using the LightevalTaskConfig class, and prompt generation is streamlined through a reusable serbian_eval_prompt function.

Changes:

Task Configurations:
- Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using LightevalTaskConfig.
- Enum class HFSubsets added for dataset subset management, improving code maintainability and clarity.
- create_task_config function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.
Prompt Functions:
- The serbian_eval_prompt function creates a structured multiple-choice prompt in Serbian.
- The function supports dynamic query and choice generation with configurable tasks.
Logging:
- A hello_message banner is printed upon task initialization, listing all available tasks.
- Task names are dynamically generated and printed using hlog_warn.

Key Features:

Modular Design: Task configurations are modular, reusable, and easily extendable to accommodate new datasets and tasks.
Improved Readability: Introduction of the HFSubsets Enum improves the readability and maintainability of the dataset subset references.
Enhanced Flexibility: create_task_config function simplifies task creation, promoting cleaner and more maintainable code.
Clear Logging: Logging includes a friendly welcome message and a list of available tasks for easier debugging and interaction.

Future Enhancements:

Additional prompt functions can be added for different task types.
Unit tests should be written to ensure the integrity of prompt generation and task configuration.

DeanChugall commented 1 month ago

Fixed ruff format --check . for ci,

DeanChugall commented 1 month ago

It would be great that we using pre-commit run but when this is run some of file not satisfy criteria, and I don't want to mess with this file. File affected with pre-commit run on image below.

Screenshot from 2024-10-07 10-52-25