Logs from SWE-Agent running on SWE-Bench

harrytormey commented 7 months ago

First off, I'd like to say thank you so much for publishing SWE-bench and SWE-agent. I was wonder is their anywhere that has the logs from running SWE-Bench/SWE-ENG evaluation are posted? I am working on some langchain scripts to categorize and groups bugs/features that are being used to evaluate the models/agents and I'd like to dig into what issues fail or succeeded.

I noticed on a previous issue @carlosejimenez provide links to generated results from Claude and GPTs. Is there something similar available but for the logs resulting from evaluation? Thanks so much in advance.

ofirpress commented 7 months ago

We'll release this soon, along with a preprint describing the SWE-agent system.

ofirpress commented 6 months ago

Update: all logs are here: https://github.com/swe-bench/experiments/tree/main/evaluation/test/20240402_sweagent_gpt4

princeton-nlp / SWE-agent

Logs from SWE-Agent running on SWE-Bench #21