Comments on PullRequestBenchmark?

nus-apr / auto-code-rover

A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 30.67% tasks (pass@1) in SWE-bench lite with each task costs less than $0.7.

Other

2.37k stars 236 forks source link

Comments on PullRequestBenchmark? #3

Closed mrconter1 closed 2 months ago

mrconter1 commented 2 months ago

Hello!

I am the author of PullRequestBenchmark and I am wondering if you have any thoughts on that?

Best Regards

zhiyufan commented 2 months ago

Thank you for building the benchmark, that looks interesting to me! As of now, we are focusing on improving Auto-Code-Rover regarding the program repair and feature addition capabilities. There is a lot more to work on. We will consider the PR evaluation benchmark in the future! So I'm going to close the issue for now.

Thank you.

mrconter1 commented 2 months ago

No problem! Thank you for sharing your thoughts!