swe-bench / swe-bench.github.io

Landing page + leaderboard for SWE-Bench benchmark
0 stars 0 forks source link

Evaluation Dashboard #2

Closed john-b-yang closed 3 months ago

john-b-yang commented 3 months ago

This PR introduces the initial SWE-bench evaluation dashboard, which provides automated analyses of model performance on SWE-bench splits.

If you're interested in contributing new analytics, please check out the http://github.com/swe-bench/experiments repository, then modify this repository with the approach JS calls to propagate the results to the dashboard.

More specifically...

  1. Add your analysis script to the analysis/ folder. If you're not sure how to create an analysis, look for examples in the folder.
  2. When your script runs, have it generate a JSON containing the values that is then saved to the results/ folder. See this folder an example of some analytics values.
  3. In this repository, add the necessary javascript functionality to the js/ folder to retrieve the JSON via a web call, then show it via HTML. Look in the js/ folder for examples.
  4. Finally, update the viewer.html file with an import of the js/<your new file>.js logic and add the necessary styling to show the data.