Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

jiezhangGt commented 5 months ago

Describe the issue

Issue:

Command:

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path llava-hf/llava-v1.6-34b-hf --port 30000 --tp 7

the error is

[2024-04-10 20:18:11,995] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 754/754 [00:00<00:00, 5.15MB/s] tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.86k/1.86k [00:00<00:00, 16.8MB/s] tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03M/1.03M [00:00<00:00, 4.17MB/s] added_tokens.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23.0/23.0 [00:00<00:00, 228kB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 748/748 [00:00<00:00, 7.38MB/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. server started on [0.0.0.0]:10005 server started on [0.0.0.0]:10006 server started on [0.0.0.0]:10007 server started on [0.0.0.0]:10008 server started on [0.0.0.0]:10009 server started on [0.0.0.0]:10010 server started on [0.0.0.0]:10011 accepted ('127.0.0.1', 29117) with fd 93 welcome ('127.0.0.1', 29117) accepted ('127.0.0.1', 38123) with fd 85 welcome ('127.0.0.1', 38123) accepted ('127.0.0.1', 48207) with fd 86 welcome ('127.0.0.1', 48207) accepted ('127.0.0.1', 14646) with fd 95 welcome ('127.0.0.1', 14646) accepted ('127.0.0.1', 44617) with fd 97 welcome ('127.0.0.1', 44617) accepted ('127.0.0.1', 38550) with fd 99 welcome ('127.0.0.1', 38550) accepted ('127.0.0.1', 24387) with fd 95 welcome ('127.0.0.1', 24387) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. router init state: Traceback (most recent call last): File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 68, in start_router_process model_client = ModelRpcClient(server_args, port_args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 640, in init rets = [obtain(x) for x in executor.map(init_model, range(tp_size))] File "miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 640, in rets = [obtain(x) for x in executor.map(init_model, range(tp_size))] File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "miniconda3/envs/llava/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 638, in init_model return self.model_servers[i].init_model(i, server_args, port_args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/netref.py", line 239, in call return syncreq(_self, consts.HANDLE_CALL, args, kwargs) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/netref.py", line 63, in syncreq return conn.sync_request(handler, proxy, args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 744, in sync_request return _asyncres.value File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/async.py", line 111, in value raise self._obj _get_exception_class..Derived: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

========= Remote Traceback (1) ========= Traceback (most recent call last): File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request res = self._HANDLERS[handler](self, args) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call return obj(args, **dict(kwargs)) File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 70, in exposed_init_model self.model_runner = ModelRunner( File "HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 271, in init torch.cuda.set_device(self.tp_rank) File "miniconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/init.py", line 404, in set_device torch._C._cuda_setDevice(device) File "miniconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/init.py", line 284, in _lazy_init raise RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

detoken init state: init ok goodbye ('127.0.0.1', 14646) goodbye ('127.0.0.1', 24387) goodbye ('127.0.0.1', 29117) goodbye ('127.0.0.1', 48207) goodbye ('127.0.0.1', 38550) goodbye ('127.0.0.1', 38123) goodbye ('127.0.0.1', 44617)

MengSunS commented 5 months ago

set start method to be spawn : https://stackoverflow.com/questions/61939952/mp-set-start-methodspawn-triggered-an-error-saying-the-context-is-already-be

Poeroz commented 5 months ago

same problem when running llava-v1.6-vicuna-13b with sglang:

CUDA_VISIBLE_DEVICES=1,2 python -m sglang.launch_server --model-path llava-v1.6-vicuna-13b --tokenizer-path llava-v1.6-vicuna-13b-hf/ --port 30000 --tp 2

Traceback (most recent call last):
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 70, in exposed_init_model
    self.model_runner = ModelRunner(
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 271, in __init__
    torch.cuda.set_device(self.tp_rank)
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/__init__.py", line 404, in set_device
    torch._C._cuda_setDevice(device)
  File "/data/***/anaconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

trimbilrepo commented 4 months ago

I have the same error RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

is there any workaround how to solve this error?

chrisx599 commented 4 months ago

i modified the sglang's code, and it worked for me add this in sglang>srt>server.py line 143

try:
    mp.set_start_method('spawn', force=True)
    print("spawned")
except RuntimeError:
    pass

endNone commented 1 month ago

i modified the sglang's code, and it worked for me add this in sglang>srt>server.py line 143
try:
    mp.set_start_method('spawn', force=True)
    print("spawned")
except RuntimeError:
    pass
It does not work for me

Lin-sudo commented 2 weeks ago

it worked for me, but I add the above code in func launch_server, line 279, sglang v0.3.0

haotian-liu / LLaVA

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method #1391

Describe the issue