instructlab / eval

Python library for Evaluation
https://pypi.org/project/instructlab-eval/
Apache License 2.0
10 stars 20 forks source link

Evaluation of user data using Unitxt #176

Open Roni-Friedman opened 3 weeks ago

Roni-Friedman commented 3 weeks ago

Here is the suggested flow. Let's discuss in a meeting to see it makes sense and modify as needed:

Evaluation command [ilab model evaluate new_data] will have the following parameters:

1 - csv_path for user data

2 - task_type out of the following options:

3 - use_llmaaj (False by default)

4 - num_shots (0 by default)

Following the command, unitxt will run the provided data with the task of choice, and replace the metric if llmaaj is selected. The data will be run in multiple configurations (fitted into different templates that match the task). Results will include a recommendation for the best template of those used.

Roni-Friedman commented 1 week ago

@alimaredia Following yesterday's meeting, could you please share the evaluation notebook you prepared? This will help us understand the plan better and identify what contribution can be offered with unitxt

Roni-Friedman commented 1 week ago

@danmcp @alimaredia regarding the linked PR - I have addressed all issues, except the parameterization of the unitxt recipe, which I believe is no longer relevant to our current discussion. Perhaps it is better to close it and open a new one once we've defined the features it will contain?

danmcp commented 1 week ago

@danmcp @alimaredia regarding the linked PR - I have addressed all issues, except the parameterization of the unitxt recipe, which I believe is no longer relevant to our current discussion. Perhaps it is better to close it and open a new one once we've defined the features it will contain?

I don't have a strong preference whether we close it of leaving it hanging out for a bit. Agree if we do settle in on a different design it should be a new PR.