[Feature] Support BigCodeBench

terryyz commented 3 months ago

Describe the feature

BigCode (Hugging Face and ServiceNow Research) released a new large-scale benchmark, BigCodeBench, for code generation with diverse function calls and complex instructions, covering 1140 expert-annotated tasks. It has been officially used by DeepSeek and CodeGeeX4. BigCodeBench is considered a better alternative for HumanEval and other function-level code generation benchmarks (see here).

Will you implement it?

[ ] I would like to implement this feature and create a PR!

tonysy commented 3 months ago

Thanks for the suggestions, would you like to contribute this bench into OpenCompass?

terryyz commented 3 months ago

Thanks for asking! I'm quite busy these two months, but I might be able to take a look in October if no one else has the bandwidth to add the bench.

open-compass / opencompass

[Feature] Support BigCodeBench #1460

Describe the feature

Will you implement it?