For my project, I'm trying to fine-tune CodeGen models on my dataset and evaluate the resulting fine-tuned model on the HumanEval benchmark dataset. I have a few questions that I would appreciate if you could address.
First, why in the sampling code, at line 234, we have tokenizer.pad_token == args.pad, which is 50256. Shouldn't we set the pad_token to eos_token, not 50256 (which is the eos_token_id)? I'm confused by this. At line 240, you set the parameter pad_token_id=args.pad. So in your sampling code, both pad_token and pad_token_id are set to 50256. Can you please elaborate on this? That would be super helpful.
As a baseline, I need to replicate your single-turn HumanEval benchmark results, but unfortunately, I'm getting surprisingly lower results compared to what is reported in the paper. And, I'm 99% positive that I'm probably missing a point. To produce Table 1 results in the paper, did you use the exact same sampling procedure as sample.py?
Hi,
For my project, I'm trying to fine-tune CodeGen models on my dataset and evaluate the resulting fine-tuned model on the HumanEval benchmark dataset. I have a few questions that I would appreciate if you could address.
First, why in the sampling code, at line 234, we have
tokenizer.pad_token == args.pad
, which is 50256. Shouldn't we set the pad_token to eos_token, not 50256 (which is the eos_token_id)? I'm confused by this. At line 240, you set the parameter pad_token_id=args.pad. So in your sampling code, both pad_token and pad_token_id are set to 50256. Can you please elaborate on this? That would be super helpful.As a baseline, I need to replicate your single-turn HumanEval benchmark results, but unfortunately, I'm getting surprisingly lower results compared to what is reported in the paper. And, I'm 99% positive that I'm probably missing a point. To produce Table 1 results in the paper, did you use the exact same sampling procedure as sample.py?
Thanks a lot for your time.