Open crhf opened 5 days ago
Currently, the timeout
when evaluating on an instance only seems to affect running eval.sh
in the container. I wonder if there could be problems with some of the cleanup steps that hang instead...
Does this work at your end? I still get the same problem
Hmm that's a bit weird, we haven't seen this before.
There are some task instances which we know take a very long time to run (e.g. some of the scikit-learn ones). However, the longest we've seen that take is 3 hours. Also, none of the very long running instances are in the lite split, which looks like what you're evaluating on.
It's a bit difficult to diagnose without knowing which task instances the evaluation is stuck on. Would you happen to have this info? Or perhaps given that 270/296 finished running, what are the 296 - 270 = 26 issues that haven't finished?
Describe the bug
Thanks for making the containerized evaluation environment, which will make the evaluation easier and more accurate! However, while I was trying it out, the containerized evaluation always got stuck in the middle. Did I missing something?
Steps/Code to Reproduce
Expected Results
All the predictions get evaluated.
Actual Results
Evaluation got stuck in the middle:
The progress bar hung here for tens of hours. Pressing Ctrl+C gave the following:
I tried three times, and the evaluations all hung at different points.
System Information
Ubuntu 20.04.6 LTS, Python 3.9, swebench
68d8059
, more than 100 CPUs