Closed ramsey-coding closed 2 weeks ago
Ah yes no problem, I'm assuming you're referring to one of the bullet points on the SWE-bench Lite info page.
This refers to task instances with corresponding fail to pass (f2p) tests that verify whether a particular error was thrown with a specific message. A pseudocode example:
def unit_test_1(...):
[Some code here]
assert ValueError was thrown with the specific message "You provided the wrong value"
Which implies that the code fix must have introduced some case where:
def tested_function():
[Some code here]
raise ValueError("You provided the wrong value")
The reason we removed these is that it is very difficult to get the specific string error message correct. However, based on the conventions of the codebase, a human task worker is certainly still capable of inferring the format of the string error message. These types of issues make up a small portion of the full SWE-bench test set. You can see the logic for identifying + filtering out such issues here.
Hope this helps! Thanks for the great question.
Marking this as complete, but please feel free to continue the thread with any follow up questions.
Describe the issue
In the SWE-bench lite dataset documentation, it is mentioned:
I do not understand what this statement refers to.
Every instance in the SWE-bench dataset consists of an issue and a test for validation.
Could you please clarify what is meant by removing instances that contain tests with error message checks?
Thank you for your assistance.
Suggest an improvement to documentation
No response