huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
467 stars 54 forks source link

Add Code-Centric Interface to LightEval for Enhanced Usability #148

Open adithya-s-k opened 2 months ago

adithya-s-k commented 2 months ago

Enhancing the functionality of LightEval to better accommodate coding workflows is paramount. The current approach relies heavily on command-line interaction (CLI), but a more code-centric interface would greatly benefit users.

Consider the following refinement:

# Install LightEval package
pip install lighteval

from lighteval import Evaluator, EvaluatorArguments

def configure_dataset():
    # Define dataset formatting and evaluation parameters here

# Initialize evaluator for custom dataset evaluations
evaluator = Evaluator(
    model=model,
    eval_dataset=dataset,
    metric="loglikelihood_acc",
    dataset_text_field=configure_dataset,
    args=EvaluatorArguments(
        # Specify additional arguments for evaluation configuration
        # e.g., batch size, evaluation steps, etc.
        # Example:
        batch_size=32,
        num_workers=4,
        ...
    ),
)

# Initiate the evaluation process
evaluator.evaluate()

# Display results and publish statistics to the Hugging Face Hub
evaluator.show_results()
evaluator.push_results()

This revised approach emphasizes a more structured and Pythonic usage of LightEval, with clear functions to define dataset formatting and evaluation specifics. Additionally, it leverages the EvaluatorArguments class to encapsulate additional evaluation configurations like batch size and number of workers. The usage of Evaluator and related methods is aligned with conventional Python programming paradigms, enhancing usability and integration within code-centric workflows.

if this is a feature you guys believe would be beneficial, I am eager to contribute to its development and enhancement.

@clefourrier @NathanHB

clefourrier commented 2 months ago

That sounds very cool, and I think we'll want a feature like this, yes!

Let us discuss it a bit internally and we'll come back to you?

adithya-s-k commented 2 months ago

@clefourrier Certainly!

There are a few developers who are interested in something similar within the communities I am part of. I also have a very basic implementation of it working.

I look forward to hearing from you and the team, and I would also love to contribute.

adithya-s-k commented 2 months ago

hey @clefourrier just wanted to know if there are any updates with respect to the status of this issue and is this something you guys would be interested in adding?

clefourrier commented 2 months ago

Hi! Sorry we've been a bit under the water atm! Yes it's still something we'd love to add/have the community add - we first wanted to merge what we are doing with the CLI (having an actual ligtheval CLI to call instead of using the scripts, draft is here), then adapt the function calls to allow both CLI and code centric calls as above.

To get to the above, we would need to use the new model configuration files, maybe add a task configuration file too, and then create an Evaluator class which would wrap up our main_accelerate code.

If you're really eager to start on it, you can give it a go rn and we'll iterate on the fly, wdyt?