openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.31k stars 330 forks source link

Evaluations timing out #40

Closed antonkarlsson1 closed 7 months ago

antonkarlsson1 commented 7 months ago

Hello,

I've been messing around with the benchmark for the last couple of days, everything as been running perfectly. All of a sudden the evaluations started to time out and I can't figure out why, been trying to find the issue the last few days. Anyone has come across something equal?

Better explanation: After running the benchmark in generation+evaluation mode the generation works perfectly, all the samples look correct but every time the threads start to evaluate they always time out giving terrible results for all models. I'm running the evaluations with:

WizardLM/WizardCoder-33B-V1.1 BitsandBytes 4bit quantization Temperature 0.2 Top_k 0 Top_p 0.95 Nr_samples 1 Batch size 1 Max_len_generation 1024