sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.18k stars 367 forks source link

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging #339

Open alessiodallapiazza opened 5 months ago

alessiodallapiazza commented 5 months ago

The current implementation of HTTP requests in the code utilizes urllib.request.urlopen without specifying a default timeout. This approach can lead to potential hanging of the application if the server does not respond or if the network is experiencing issues.

Code Snippet:

    # add the API Key header if an API key is provided
    if api_key is not None:
        headers["X-API-Key"] = api_key

    if stream:
        return requests.post(url, json=json, stream=True, headers=headers)
    else:
        req = urllib.request.Request(url, headers=headers)
        if json is None:
            data = None
        else:
            data = bytes(dumps(json), encoding="utf-8")
        resp = urllib.request.urlopen(req, data=data, cafile=verify)
        return HttpResponse(resp)

To mitigate this risk, I propose adding an optional timeout argument to the function(s) that wrap urllib.request.urlopen calls. This argument would allow developers to specify a custom timeout, with a sensible default set to ensure that no call hangs indefinitely.

hnyls2002 commented 5 months ago

@alessiodallapiazza We are welcome if you can submit a PR to add this feature.

Gintasz commented 4 months ago

I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10 and I get an error Connection reset by peer:

Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
    self._execute(expr)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
    self._execute_commit_lazy_operations(other)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
    self.backend.commit_lazy_operations(self)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
    res = http_request(
          ^^^^^^^^^^^^^
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
    resp = urllib.request.urlopen(req, data=data, cafile=verify)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
    r = h.getresponse()
        ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer

This makes run_batch hang and it never finishes (I have progress_bar=True and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.

Maybe it could be considered for a run_batch or sglang backend instance to have a single socket connection to a remote endpoint?

m0g1cian commented 2 months ago

I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10 and I get an error Connection reset by peer:

Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
    self._execute(expr)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
    self._execute_commit_lazy_operations(other)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
    self.backend.commit_lazy_operations(self)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
    res = http_request(
          ^^^^^^^^^^^^^
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
    resp = urllib.request.urlopen(req, data=data, cafile=verify)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
    r = h.getresponse()
        ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer

This makes run_batch hang and it never finishes (I have progress_bar=True and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.

Maybe it could be considered for a run_batch or sglang backend instance to have a single socket connection to a remote endpoint?

I guess what I was facing is similar to yours. I am currently running SGL on multiple machines to infer ~1 million prompts in a data parallel manner. However, I've noticed that it is easy for some SGL backends to hang indefinitely. I was confused and thought there's a deadlock issue until I saw this post.

Gintasz commented 2 months ago

@m0g1cian I had solved with this retry logic https://github.com/sgl-project/sglang/pull/424

alanxmay commented 1 month ago

Same problem with sglang 0.2.13