This PR introduces the initial SWE-bench evaluation dashboard, which provides automated analyses of model performance on SWE-bench splits.
If you're interested in contributing new analytics, please check out the http://github.com/swe-bench/experiments repository, then modify this repository with the approach JS calls to propagate the results to the dashboard.
More specifically...
Add your analysis script to the analysis/folder. If you're not sure how to create an analysis, look for examples in the folder.
When your script runs, have it generate a JSON containing the values that is then saved to the results/ folder. See this folder an example of some analytics values.
In this repository, add the necessary javascript functionality to the js/ folder to retrieve the JSON via a web call, then show it via HTML. Look in the js/folder for examples.
Finally, update the viewer.html file with an import of the js/<your new file>.js logic and add the necessary styling to show the data.
This PR introduces the initial SWE-bench evaluation dashboard, which provides automated analyses of model performance on SWE-bench splits.
If you're interested in contributing new analytics, please check out the http://github.com/swe-bench/experiments repository, then modify this repository with the approach JS calls to propagate the results to the dashboard.
More specifically...
analysis/
folder. If you're not sure how to create an analysis, look for examples in the folder.results/
folder. See this folder an example of some analytics values.js/
folder to retrieve the JSON via a web call, then show it via HTML. Look in thejs/
folder for examples.viewer.html
file with an import of thejs/<your new file>.js
logic and add the necessary styling to show the data.