ssube / onnx-web

web UI for GPU-accelerated ONNX pipelines like Stable Diffusion, even on Windows and AMD
https://discord.gg/7CdQmutGuw
MIT License
197 stars 24 forks source link

can't start new thread error after running for a while #170

Closed ssube closed 1 year ago

ssube commented 1 year ago

Depending on how many requests are made, and maybe how many are cancelled, there is a thread leak that eventually exhausts the worker pool and then prevents Flask for responding to new requests:

----------------------------------------
Exception occurred during processing of request from ('10.2.2.77', 63823)
Traceback (most recent call last):
  File "/usr/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 697, in process_request
    t.start()
  File "/usr/lib/python3.10/threading.py", line 935, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
----------------------------------------
----------------------------------------
Exception occurred during processing of request from ('10.2.2.77', 63824)
Traceback (most recent call last):
  File "/usr/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 697, in process_request
    t.start()
  File "/usr/lib/python3.10/threading.py", line 935, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
----------------------------------------
ssube commented 1 year ago

This looks like it may be a memory leak rather than a thread leak:

 error: ['Traceback (most recent call last):\n', '  File "/home/ssube/onnx-web/api/onnx_web/server/device_pool.py", line 243, in job_done\n    f.result()\n', '  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result\n    return self.__get_result()\n', '  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result\n    raise self._exception\n', '  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/home/ssube/onnx-web/api/onnx_web/diffusion/run.py", line 32, in run_txt2img_pipeline\n    pipe = load_pipeline(\n', '  File "/home/ssube/onnx-web/api/onnx_web/diffusion/load.py", line 185, in load_pipeline\n    pipe = pipeline.from_pretrained(\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 739, in from_pretrained\n    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/diffusers/pipelines/onnx_utils.py", line 206, in from_pretrained\n    return cls._from_pretrained(\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/diffusers/pipelines/onnx_utils.py", line 173, in _from_pretrained\n    model = OnnxRuntimeModel.load_model(\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/diffusers/pipelines/onnx_utils.py", line 78, in load_model\n    return ort.InferenceSession(path, providers=[provider], sess_options=sess_options)\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__\n    self._create_inference_session(providers, provider_options, disabled_optimizers)\n', '  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 395, in _create_inference_session\n    sess.initialize_session(providers, provider_options, disabled_optimizers)\n', 'onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:342 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 58982400\n\n']
[2023-02-18 20:48:18,408] DEBUG: onnx_web.utils: running garbage collection with 3 active threads
[2023-02-18 20:48:18,455] DEBUG: onnx_web.utils: running Torch garbage collection for device: cuda - CUDAExecutionProvider ({'device_id': 0})
[2023-02-18 20:48:18,455] DEBUG: onnx_web.utils: remaining CUDA VRAM usage: 1413611520 of 25632964608
----------------------------------------
Exception occurred during processing of request from ('10.2.2.16', 36308)
Traceback (most recent call last):
  File "/usr/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 697, in process_request
    t.start()
  File "/usr/lib/python3.10/threading.py", line 935, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
----------------------------------------

image

There are only 3 active threads, per the logs, and VRAM is reasonable, but virtual memory has been pretty much maxed out for all of the threads.

This happened after 86 requests, using this repro script:

test_images=0
while true;
do
  curl 'http://10.2.2.79:5000/api/txt2img?cfg=16.00&steps=35&scheduler=deis-multi&seed=-1&prompt=an+astronaut+eating+a+hamburger&negativePrompt=&model=diffusion-snow-globe-v1&platform=any&upscaling=upscaling-real-esrgan-x2-plus&correction=correction-codeformer&lpw=false&width=512&height=512&upscaleOrder=correction-both' \
    -X 'POST' \
    --compressed \
    --insecure || break;
  ((test_images++));
  echo "waiting after $test_images";
  sleep 35;
done
ssube commented 1 year ago

Tested again with a fresh server, the error happened after 94 images:

[2023-02-19 00:56:16,036] DEBUG: onnx_web.utils: running garbage collection with 3 active threads
[2023-02-19 00:56:16,084] DEBUG: onnx_web.utils: running Torch garbage collection for device: cuda - CUDAExecutionProvider ({'device_id': 0})
[2023-02-19 00:56:16,085] DEBUG: onnx_web.utils: remaining CUDA VRAM usage: 2246180864 of 25632964608
----------------------------------------
Exception occurred during processing of request from ('10.2.2.16', 37058)
Traceback (most recent call last):
  File "/usr/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 697, in process_request
    t.start()
  File "/usr/lib/python3.10/threading.py", line 935, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
----------------------------------------

Virtual memory was very high, as before:

image

ssube commented 1 year ago

https://stackoverflow.com/a/66130494 suggests this could be related to/resolved by #159

ssube commented 1 year ago

Running under waitress lasted for 93 images and gives a more useful error:

Traceback (most recent call last):
  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ssube/onnx-web/api/onnx_env/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/ssube/onnx-web/api/onnx_web/serve.py", line 558, in txt2img
    executor.submit(
  File "/home/ssube/onnx-web/api/onnx_web/server/device_pool.py", line 236, in submit
    context = JobContext(key, self.devices, device_index=device)
  File "/home/ssube/onnx-web/api/onnx_web/server/device_pool.py", line 33, in __init__
    self.cancel = Value("B", cancel)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 135, in Value
    return Value(typecode_or_type, *args, lock=lock,
  File "/usr/lib/python3.10/multiprocessing/sharedctypes.py", line 79, in Value
    lock = ctx.RLock()
  File "/usr/lib/python3.10/multiprocessing/context.py", line 73, in RLock
    return RLock(ctx=self.get_context())
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 187, in __init__
    SemLock.__init__(self, RECURSIVE_MUTEX, 1, 1, ctx=ctx)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
OSError: [Errno 12] Cannot allocate memory

with

waiting after 93                        
<!doctype html>               
<html lang=en>                                                                                                                                                                      
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>                   
waiting after 94    
ssube commented 1 year ago

This is not related to the requests or the server. If many jobs are queued up, the memory leak continues growing as they run, after the requests stop.

https://github.com/microsoft/onnxruntime/issues/14641

ssube commented 1 year ago

This should be fixed by #205, which switches to a process worker pool. I ran the memory leak test script for 600 images with no problems.

ssube commented 1 year ago

I've tested the solution more, up through 1051 images, made a few improvements and found a few issues:

ssube commented 1 year ago

With the v0.8.1 fixes, I've tested this up through 4242 images in single series.

There is still a bug when sending multiple images in quick succession. If a worker is recycled in between, it may attempt to start a second job on the same device (given enough memory), and one of them is likely to fail in some way.