nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
9.11k stars 711 forks source link

Inferece code demo for WizardCoder #177

Closed ganler closed 10 months ago

ganler commented 11 months ago

Hi, thanks for the amazing work. I am interested in evaluating WizardCoder-Python-34B-V1.0 on HumanEval+. Just curious if there is a minimal Python/HF code snippet demo for me to reference? Thanks!

ChiYeungLaw commented 11 months ago

Thanks for your great eval-plus project. We conducted an extra evaluation on the HE+. The pass@1 is 64.6 (greedy), higher than ChatGPT (63.4). You can use humaneval_gen_vllm.py to generate the code completions.

pip install vllm # This can acclerate the inference process a lot.
pip install transformers==4.31.0

model="/path/to/your/model"
temp=0.2 # set to 0.0 for greedy decoding
max_len=2048
pred_num=200 # set to 1 for greedy decoding
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
  --start_index 0 --end_index 164 --temperature ${temp} \
  --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4
ganler commented 10 months ago

Great! I am able to obtained the raw output (in a dialog fashion). Curious if you can point me to the post-processing script to turn them into actual code? (I guess it is simply s.split("```python")[-1].split("```")[0]?)

ChiYeungLaw commented 10 months ago

yes. we use a similar method. https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/process_humaneval.py

ganler commented 10 months ago

Perfect, we now obtain the results which looks strong and they are updated at https://evalplus.github.io/leaderboard.html

image

Thanks for the great work!