Open terryyz opened 3 months ago
Thanks for the suggestions, would you like to contribute this bench into OpenCompass?
Thanks for asking! I'm quite busy these two months, but I might be able to take a look in October if no one else has the bandwidth to add the bench.
Describe the feature
BigCode (Hugging Face and ServiceNow Research) released a new large-scale benchmark, BigCodeBench, for code generation with diverse function calls and complex instructions, covering 1140 expert-annotated tasks. It has been officially used by DeepSeek and CodeGeeX4. BigCodeBench is considered a better alternative for HumanEval and other function-level code generation benchmarks (see here).
Will you implement it?