Open KylinMountain opened 2 days ago
After adding modelscope, it raise exception in docker.
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 16, in <module>
raise e
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in <module>
launch_server(server_args)
File "/sgl-workspace/sglang/python/sglang/srt/server.py", line 373, in launch_server
raise RuntimeError(
RuntimeError: Initialization failed. controller_init_state: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 145, in start_controller_process
controller = ControllerSingle(
File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 81, in __init__
self.tp_server = ModelTpServer(
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 100, in __init__
self.model_runner = ModelRunner(
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 116, in __init__
min_per_gpu_memory = self.init_torch_distributed()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 132, in init_torch_distributed
torch.cuda.set_device(self.gpu_id)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 420, in set_device
torch._C._cuda_setDevice(device)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 300, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
, detoken_init_state: init ok
It is able to run successfully in the host with python -m sglang.....
But it is not able to run in docker with lmsysorg/sglang:latest.
My enviroment is Nvidia A100x4.
Checklist
Describe the bug
when using docker compose to start sglang, and config the env to SGLANG_USE_MODELSCOPE. it will report error like modelscope is not found.
Reproduction
docker compose up -d
Environment
using docker to run, no idea how to run this.
A100 x 4.