Closed carlosejimenez closed 3 months ago
Attention: Patch coverage is 73.68421%
with 5 lines
in your changes missing coverage. Please review.
Project coverage is 54.62%. Comparing base (
a8df201
) to head (312b914
). Report is 4 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
swebench/harness/utils.py | 66.66% | 5 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Right now, when the
run_report
is reported and saved at the end of an evaluation run, theinstance_ids
filter is being ignored, which can be confusing and causes errors in the report for fields likeerror_ids
and some other things.This PR changes the
make_run_report
function to filter the full dataset considered and compare only againstinstance_ids
instead of the complete dataset's unfiltered ids.It should make it easier to see which ids actually had errors when running, as well as make it easier to understand performance from the
run_report
when using theinstance_ids
filter.