raymyers / swe-bench-util

Scripts for working with SWE-Bench, the AI coding agent benchmark
Apache License 2.0
6 stars 2 forks source link

Added inference and a way to retrieve the "File Hints" for assistants #7

Closed phact closed 5 months ago

phact commented 5 months ago

This PR allows you to create an assistant and run inference against a row from the benchmark.

For https://github.com/raymyers/swe-bench-util/pull/6 to work we needed to be able to get back what files are being used under the hood by Assistants inside a run. OpenAI's Assistants API has a placeholder for this but they don't use it.

From the OpenAI docs: image

I implemented this functionality in astra-assitants (v0.1.9) and released it https://github.com/datastax/astra-assistants-api/pull/15

To test just run:

python -m swe_bench_util index astra-assistants
raymyers commented 5 months ago

Good stuff, merging. Fixed a small merge conflict that was due to formatting, in the future ruff format should help avoid that. If there's a way that makes it harder to forget to run that would be neat.