When an error happens during sampling LLMs, all samples retrieved before are lost. This might cause substantial costs, e.g. when working with commercial LLMs. We might build in some error-handling, e.g. in this notebook. We could for example build a repeat-n-times scheme, e.g. like this (untested code):
def generate_one_completion_blablador_mistral(input_code):
n_times_until_no_error = 3
for _ in range(n_times_until_no_error):
try:
import openai
import os
client = openai.OpenAI()
client.base_url = 'https://helmholtz-blablador.fz-juelich.de:8000/v1'
client.api_key = os.environ.get('BLABLADOR_API_KEY')
response = client.chat.completions.create(
model=model_blablador_mistral,
messages=[{"role": "user", "content": setup_prompt(input_code)}],
)
return response.choices[0].message.content.strip()
except:
import time
time.sleep(10000) # wait 10 seconds before trying again
continue # try again in case of error
Alternatively, we need to find a way to store intermediate jsonl files. This would require a modification within the HumanEval framework, which I don't know very well.
When an error happens during sampling LLMs, all samples retrieved before are lost. This might cause substantial costs, e.g. when working with commercial LLMs. We might build in some error-handling, e.g. in this notebook. We could for example build a repeat-n-times scheme, e.g. like this (untested code):
Alternatively, we need to find a way to store intermediate jsonl files. This would require a modification within the HumanEval framework, which I don't know very well.