Custom Evaluation Metric

BAJUKA commented 1 month ago

Hi,

I have created a competition and I would like to implement a custom evaluation metric. My custom metric involves using a GPT4 from OpenAI to evaluate the predictions. I have two questions regarding this:

Is it possible to use an LLM-based evaluation metric? Specifically, I want to make API calls to OpenAI’s models to evaluate the model outputs. It might take some time to finish the evaluations. Would this cause any problems?
How do I install additional packages for my custom metric? If I can use LLM-based evaluation, then how can I install additional packages required for my metric? I’ve noticed that the requirements_docker.txt file has a limited set of pre-installed packages. Since my evaluation will require additional packages such as openai, how can I install and integrate these in the environment provided for the competition? Is there a recommended way to extend the environment or a mechanism to include more dependencies for the custom metric?

I’d really appreciate any guidance on how to proceed with this approach and how to configure the environment for the custom metric to work smoothly.

abhishekkrthakur commented 1 month ago

yes its possible.

have you taken a look at docs: https://hf.co/docs/competitions ?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 15 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 2 days since being marked as stale.

huggingface / competitions

Custom Evaluation Metric #42