Question about Auto Code Rover SWE-bench data

harrytormey commented 5 months ago

I am planning on writing an article on Auto Code Rover and I was wondering if you could tell me about the format of the SWE-bench test results in: https://github.com/nus-apr/auto-code-rover/tree/main/results/swe-agent-results How am I to interpret the results in this directory? Specifically for Devin they formatted diffs for their SWE-bench run into separate pass/fail directories: https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs How is this done for your results? Thanks in advance and thanks for publishing your work.

-Harry

zhiyufan commented 5 months ago

There is a final_report.json file for each swe-agent-replication. The "resolved" in the final_report.json field represents the resolved task instances in SWE-bench lite. The other .traj files represent the all actions taken by SWE-agent, and conversation history with GPT-4. At the end of a .traj file, there is an "info" field, containing the generated patch (in the form of git diff) if exist.

yuntongzhang commented 5 months ago

Closing this, @harrytormey feel free to let us know if you have more questions.

nus-apr / auto-code-rover

Question about Auto Code Rover SWE-bench data #17