princeton-nlp / SWE-agent

[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.
https://swe-agent.com
MIT License
13.68k stars 1.39k forks source link

Logs from SWE-Agent running on SWE-Bench #21

Closed harrytormey closed 6 months ago

harrytormey commented 7 months ago

First off, I'd like to say thank you so much for publishing SWE-bench and SWE-agent. I was wonder is their anywhere that has the logs from running SWE-Bench/SWE-ENG evaluation are posted? I am working on some langchain scripts to categorize and groups bugs/features that are being used to evaluate the models/agents and I'd like to dig into what issues fail or succeeded.

I noticed on a previous issue @carlosejimenez provide links to generated results from Claude and GPTs. Is there something similar available but for the logs resulting from evaluation? Thanks so much in advance.

ofirpress commented 7 months ago

We'll release this soon, along with a preprint describing the SWE-agent system.

ofirpress commented 6 months ago

Update: all logs are here: https://github.com/swe-bench/experiments/tree/main/evaluation/test/20240402_sweagent_gpt4