tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
688 stars 42 forks source link

Code-Related Benchmarks #11

Closed john-b-yang closed 10 months ago

john-b-yang commented 11 months ago

Hi TJUNLP team, thanks so much for the great work! I think a paper that presents a holistic view of current NLP benchmarks is relevant amidst the many ongoing efforts.

To this end, I'd like to point out a couple works concerning evaluating language models on the coding related tasks, such as completion, patch generation, and language agents using code as actions.

Thanks in advance!

allen3ai commented 10 months ago

We have added these paper into our paperlist and will update our survey including them in next version.

john-b-yang commented 10 months ago

Sweet, thanks so much for the update!