As has been brought up before (#1384, #1292, https://github.com/openai/evals/pull/270), evals suffer from a hanging issue, where an evaluation run will hang for a very long time (if not indefinitely) at the end of a run (say, on the 99th sample of out 100).
This PR addresses this issue, by replacing a seemingly redundant single-threaded thread creation that was happening when making requests, nested inside the already multi-threaded eval loop. My impression is that this nested multithreading was causing overhead that resulted in the hanging experienced.
I had also noticed this hanging issue in EVALS_SEQUENTIAL=1 mode (where it no longer occurs at the end, but instead randomly in the middle of the run).
I was able to identify the source of this issue though debugging print statements that ultimately pointed to the request_with_timeout function as the culprit.
We have tested the new request_with_timeout code on a fork where we have run multiple new and pre-existing evals, including with 3rd party solvers, and found no change in behaviour or errors, and a clear improvement on the hanging issue.
As has been brought up before (#1384, #1292, https://github.com/openai/evals/pull/270), evals suffer from a hanging issue, where an evaluation run will hang for a very long time (if not indefinitely) at the end of a run (say, on the 99th sample of out 100).
This PR addresses this issue, by replacing a seemingly redundant single-threaded thread creation that was happening when making requests, nested inside the already multi-threaded eval loop. My impression is that this nested multithreading was causing overhead that resulted in the hanging experienced.
I had also noticed this hanging issue in
EVALS_SEQUENTIAL=1
mode (where it no longer occurs at the end, but instead randomly in the middle of the run).I was able to identify the source of this issue though debugging print statements that ultimately pointed to the
request_with_timeout
function as the culprit.We have tested the new
request_with_timeout
code on a fork where we have run multiple new and pre-existing evals, including with 3rd party solvers, and found no change in behaviour or errors, and a clear improvement on the hanging issue.