Open alessiodallapiazza opened 5 months ago
@alessiodallapiazza We are welcome if you can submit a PR to add this feature.
I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10
and I get an error Connection reset by peer
:
Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
self._execute(expr)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
self._execute_commit_lazy_operations(other)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
self.backend.commit_lazy_operations(self)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
res = http_request(
^^^^^^^^^^^^^
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
resp = urllib.request.urlopen(req, data=data, cafile=verify)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
return self.do_open(http.client.HTTPConnection, req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
r = h.getresponse()
^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
response.begin()
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer
This makes run_batch
hang and it never finishes (I have progress_bar=True
and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.
Maybe it could be considered for a run_batch
or sglang backend instance to have a single socket connection to a remote endpoint?
I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57,
num_threads=10
and I get an errorConnection reset by peer
:Exception in thread Thread-360 (_thread_worker_func): Traceback (most recent call last): File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func self._execute(expr) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute self._execute_commit_lazy_operations(other) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations self.backend.commit_lazy_operations(self) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations res = http_request( ^^^^^^^^^^^^^ File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request resp = urllib.request.urlopen(req, data=data, cafile=verify) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen return opener.open(url, data, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open response = self._open(req, data) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open result = self._call_chain(self.handle_open, protocol, protocol + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain result = func(*args) ^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open return self.do_open(http.client.HTTPConnection, req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open r = h.getresponse() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse response.begin() File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin version, status, reason = self._read_status() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto return self._sock.recv_into(b) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 54] Connection reset by peer
This makes
run_batch
hang and it never finishes (I haveprogress_bar=True
and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.Maybe it could be considered for a
run_batch
or sglang backend instance to have a single socket connection to a remote endpoint?
I guess what I was facing is similar to yours. I am currently running SGL on multiple machines to infer ~1 million prompts in a data parallel manner. However, I've noticed that it is easy for some SGL backends to hang indefinitely. I was confused and thought there's a deadlock issue until I saw this post.
@m0g1cian I had solved with this retry logic https://github.com/sgl-project/sglang/pull/424
Same problem with sglang 0.2.13
The current implementation of HTTP requests in the code utilizes
urllib.request.urlopen
without specifying a default timeout. This approach can lead to potential hanging of the application if the server does not respond or if the network is experiencing issues.Code Snippet:
To mitigate this risk, I propose adding an optional timeout argument to the function(s) that wrap urllib.request.urlopen calls. This argument would allow developers to specify a custom timeout, with a sensible default set to ensure that no call hangs indefinitely.