princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.45k stars 240 forks source link

`exec_run_with_timeout` does not actually kill long-running thread #160

Open klieret opened 5 days ago

klieret commented 5 days ago

If we look at this:

def exec_run_with_timeout(container, cmd, timeout=60):
    """
    Run a command in a container with a timeout.
    """
    # Local variables to store the result of executing the command
    exec_result = None
    exec_id = None
    exception = None

    # Wrapper function to run the command
    def run_command():
        nonlocal exec_result, exec_id, exception
        try:
            exec_id = container.client.api.exec_create(container.id, cmd)["Id"]
            exec_result = container.client.api.exec_start(exec_id)
        except Exception as e:
            exception = e

    # Start the command in a separate thread
    thread = threading.Thread(target=run_command)
    thread.start()
    thread.join(timeout)

    if exception:
        raise exception

    # If the thread is still alive, the command timed out
    if thread.is_alive():
        raise TimeoutError(f"Command '{cmd}' timed out after {timeout} seconds")

    return exec_result

this does not actually kill the thread.

Much simpler example:

import threading
import time

def run():
    while True:
        time.sleep(1)
        print('still allive')

thread = threading.Thread(target=run)
thread.start()
thread.join(5)

if thread.is_alive():
    raise TimeoutError()

will give the output

still allive
still allive
still allive
still allive
Traceback (most recent call last):
  File "/Users/fuchur/tmp/running_thread.py", line 16, in <module>
    raise TimeoutError()
TimeoutError
still allive
still allive
still allive
still allive
still allive
still allive
still allive
still allive
still allive
still allive
still allive
still allive

(though we do return from the function)

Since it's impossible to kill a thread in python, it might be best to do multiprocessing here...

Of course because of the TimeoutError, it's gonna trigger the finally clause in run_instance, which calls cleanup_container, which would hopefully deal with this...

john-b-yang commented 2 days ago

Gotcha hmm ok I think I kind of understand. are you suggesting replacing use of the thread library w/ multiprocessing instead to spawn new, killable processes as opposed to threads?