Since running gen_program_synthesis.py can be costly, I have uploaded the openai projected samples in this temporary dropbox. If you are reviewing this code or just browsing, please do not upload this
data anywhere for the integrity (cross site data contamination for LLM) of the benchmark. You may use this projected data just for testing in your local machine.
Sample scripts of the evaluation.
Since running gen_program_synthesis.py can be costly, I have uploaded the openai projected samples in this temporary dropbox. If you are reviewing this code or just browsing, please do not upload this data anywhere for the integrity (cross site data contamination for LLM) of the benchmark. You may use this projected data just for testing in your local machine.