answer_type calculation is different for train/val and eval

hendrycks / apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)

MIT License

414 stars 55 forks source link

answer_type calculation is different for train/val and eval #24

Closed minimario closed 5 months ago

minimario commented 1 year ago

Not necessarily an issue, but I noticed that for train/val, the answer_type is based on whether starter_code exists but that at eval time, it's based on fn_name. Is there a reason for this difference?

train: https://github.com/hendrycks/apps/blob/main/train/dataset_apps/APPSBaseDataset.py#L67-L70
eval: https://github.com/hendrycks/apps/blob/main/eval/generate_gpt_codes.py#L67-L72

xksteven commented 1 year ago

It's a historical artifact mostly.

We focused on developing the training before developing the testing code. In training if the starter code was provided this meant that they wanted you to use their provided code which was different then evaluating code that just read from standard in or out.

While refactoring the code for testing we added the "fn_name" ourselves as a key word that we can use to determine the format the output should be in.

Hopefully that helps answer the question.