haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
19 stars 11 forks source link

Samples lost due to error when sampling #61

Open haesleinhuepf opened 5 months ago

haesleinhuepf commented 5 months ago

When an error happens during sampling LLMs, all samples retrieved before are lost. This might cause substantial costs, e.g. when working with commercial LLMs. We might build in some error-handling, e.g. in this notebook. We could for example build a repeat-n-times scheme, e.g. like this (untested code):

def generate_one_completion_blablador_mistral(input_code):
    n_times_until_no_error = 3
    for _ in range(n_times_until_no_error):
        try:
            import openai
            import os

            client = openai.OpenAI()
            client.base_url = 'https://helmholtz-blablador.fz-juelich.de:8000/v1'
            client.api_key = os.environ.get('BLABLADOR_API_KEY')
            response = client.chat.completions.create(
                model=model_blablador_mistral,
                messages=[{"role": "user", "content": setup_prompt(input_code)}],
            )
            return response.choices[0].message.content.strip()
        except:
            import time
            time.sleep(10000) # wait 10 seconds before trying again
            continue # try  again in case of error

Alternatively, we need to find a way to store intermediate jsonl files. This would require a modification within the HumanEval framework, which I don't know very well.

haesleinhuepf commented 5 months ago

... and this might be possible to program once for all functions, using the scheme used here