zhangxjohn / LLM-Agent-Benchmark-List

A banchmark list for evaluation of large language models.
Apache License 2.0
55 stars 1 forks source link

Add '[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?' #1

Open 0xdevalias opened 5 months ago

0xdevalias commented 5 months ago
zhangxjohn commented 5 months ago

Thank you for your contribution. SWE-bench has been added.

0xdevalias commented 5 months ago

image

Looks like there may have been some unintended changes along with that?

zhangxjohn commented 5 months ago

Sorry,the errors has been fixed.