ntunlp / xCodeEval

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
MIT License
74 stars 6 forks source link

Evaluation code #8

Closed sbmaruf closed 1 month ago

sbmaruf commented 1 year ago

Sample scripts of the evaluation.

Since running gen_program_synthesis.py can be costly, I have uploaded the openai projected samples in this temporary dropbox. If you are reviewing this code or just browsing, please do not upload this data anywhere for the integrity (cross site data contamination for LLM) of the benchmark. You may use this projected data just for testing in your local machine.

yazdanbakhsh commented 1 month ago

Sounds good.