raymyers / swe-bench-util

Scripts for working with SWE-Bench, the AI coding agent benchmark
Apache License 2.0
6 stars 2 forks source link

Perf and usability improvements #9

Closed phact closed 5 months ago

phact commented 5 months ago

Parallelizes file uploads, handles rate limiting, and avoids creating the assistant every time if it already exists.

Pushed some related improvements to assistants to do ANN against thousands of files in parallel:

https://github.com/datastax/astra-assistants-api/commit/547d63e34259683b988c9d44320b67a5070b21dc

Something I'm not clear on is when / how do we want to trigger reindexing given that rows are based on different commit hashes? My original plan was to just checkout to a /HASH/ directory but that changed in some of the restructuring. I guess the smart thing to do would be to keep track of file hashes and only re-index files that changed.

Thoughts?

raymyers commented 5 months ago

Just pasting my remark from chat:

On the reindex question, I’ve thought about this but never tried it: In an embedding store you have tags on the docs usually. So tags can include repo and git hash. So maybe at index time for each file you mark it with every hash that has an exercise and for which the file hasn’t changed from that version. Then you should be able to query at every hash and get the right version without indexing redundancy.

If multiple processes absolutely needed to concurrently navigate the same repo at different hashes, that could still avoid multiple clones using git worktrees or git show.