Open jiezhangGt opened 5 months ago
set start method to be spawn : https://stackoverflow.com/questions/61939952/mp-set-start-methodspawn-triggered-an-error-saying-the-context-is-already-be
same problem when running llava-v1.6-vicuna-13b with sglang:
CUDA_VISIBLE_DEVICES=1,2 python -m sglang.launch_server --model-path llava-v1.6-vicuna-13b --tokenizer-path llava-v1.6-vicuna-13b-hf/ --port 30000 --tp 2
Traceback (most recent call last):
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request
res = self._HANDLERS[handler](self, *args)
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call
return obj(*args, **dict(kwargs))
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 70, in exposed_init_model
self.model_runner = ModelRunner(
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 271, in __init__
torch.cuda.set_device(self.tp_rank)
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/__init__.py", line 404, in set_device
torch._C._cuda_setDevice(device)
File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
I have the same error
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
is there any workaround how to solve this error?
i modified the sglang's code, and it worked for me add this in sglang>srt>server.py line 143
try:
mp.set_start_method('spawn', force=True)
print("spawned")
except RuntimeError:
pass
i modified the sglang's code, and it worked for me add this in sglang>srt>server.py line 143
try: mp.set_start_method('spawn', force=True) print("spawned") except RuntimeError: pass
It does not work for me
it worked for me, but I add the above code in func launch_server
, line 279, sglang v0.3.0
Describe the issue
Issue:
Command:
the error is
[2024-04-10 20:18:11,995] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 754/754 [00:00<00:00, 5.15MB/s] tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.86k/1.86k [00:00<00:00, 16.8MB/s] tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03M/1.03M [00:00<00:00, 4.17MB/s] added_tokens.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23.0/23.0 [00:00<00:00, 228kB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 748/748 [00:00<00:00, 7.38MB/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. server started on [0.0.0.0]:10005 server started on [0.0.0.0]:10006 server started on [0.0.0.0]:10007 server started on [0.0.0.0]:10008 server started on [0.0.0.0]:10009 server started on [0.0.0.0]:10010 server started on [0.0.0.0]:10011 accepted ('127.0.0.1', 29117) with fd 93 welcome ('127.0.0.1', 29117) accepted ('127.0.0.1', 38123) with fd 85 welcome ('127.0.0.1', 38123) accepted ('127.0.0.1', 48207) with fd 86 welcome ('127.0.0.1', 48207) accepted ('127.0.0.1', 14646) with fd 95 welcome ('127.0.0.1', 14646) accepted ('127.0.0.1', 44617) with fd 97 welcome ('127.0.0.1', 44617) accepted ('127.0.0.1', 38550) with fd 99 welcome ('127.0.0.1', 38550) accepted ('127.0.0.1', 24387) with fd 95 welcome ('127.0.0.1', 24387) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. router init state: Traceback (most recent call last): File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 68, in start_router_process model_client = ModelRpcClient(server_args, port_args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 640, in init rets = [obtain(x) for x in executor.map(init_model, range(tp_size))] File "miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 640, in
rets = [obtain(x) for x in executor.map(init_model, range(tp_size))]
File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.get_result()
File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result
raise self._exception
File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, *self.kwargs)
File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 638, in init_model
return self.model_servers[i].init_model(i, server_args, port_args)
File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/netref.py", line 239, in call
return syncreq(_self, consts.HANDLE_CALL, args, kwargs)
File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/netref.py", line 63, in syncreq
return conn.sync_request(handler, proxy, args)
File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 744, in sync_request
return _asyncres.value
File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/async.py", line 111, in value
raise self._obj
_get_exception_class..Derived: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
========= Remote Traceback (1) ========= Traceback (most recent call last): File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request res = self._HANDLERS[handler](self, args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call return obj(args, **dict(kwargs)) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 70, in exposed_init_model self.model_runner = ModelRunner( File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 271, in init torch.cuda.set_device(self.tp_rank) File "miniconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/init.py", line 404, in set_device torch._C._cuda_setDevice(device) File "miniconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/init.py", line 284, in _lazy_init raise RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
detoken init state: init ok goodbye ('127.0.0.1', 14646) goodbye ('127.0.0.1', 24387) goodbye ('127.0.0.1', 29117) goodbye ('127.0.0.1', 48207) goodbye ('127.0.0.1', 38550) goodbye ('127.0.0.1', 38123) goodbye ('127.0.0.1', 44617)