sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.34k stars 384 forks source link

[Bug] no module modelscope using docker compose to start sglang #1517

Open KylinMountain opened 2 days ago

KylinMountain commented 2 days ago

Checklist

Describe the bug

when using docker compose to start sglang, and config the env to SGLANG_USE_MODELSCOPE. it will report error like modelscope is not found.

  sglang:
    image: lmsysorg/sglang:latest
    container_name: sglang
    volumes:
      - ~/.cache/modelscope:/root/.cache/modelscope
      - ~/.cache/huggingface:/root/.cache/huggingface
    restart: always
    network_mode: host
    # Or you can only publish port 30000
    # ports:
    #   - 30000:30000
    environment:
      - 'SGLANG_USE_MODELSCOPE=true'
    entrypoint: python3 -m sglang.launch_server
    command:
      --model-path qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
      --tp 4
      --mem-fraction-static 0.7
      --chunked-prefill-size 2048
      --host 0.0.0.0
      --port 30000
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:30000/health || exit 1"]
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]

Reproduction

docker compose up -d

Environment

using docker to run, no idea how to run this.

A100 x 4.

KylinMountain commented 1 day ago

After adding modelscope, it raise exception in docker.

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 16, in <module>
    raise e
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/server.py", line 373, in launch_server
    raise RuntimeError(
RuntimeError: Initialization failed. controller_init_state: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 145, in start_controller_process
    controller = ControllerSingle(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 81, in __init__
    self.tp_server = ModelTpServer(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 100, in __init__
    self.model_runner = ModelRunner(
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 116, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 132, in init_torch_distributed
    torch.cuda.set_device(self.gpu_id)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 420, in set_device
    torch._C._cuda_setDevice(device)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 300, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
, detoken_init_state: init ok
KylinMountain commented 1 day ago

It is able to run successfully in the host with python -m sglang..... But it is not able to run in docker with lmsysorg/sglang:latest.

My enviroment is Nvidia A100x4.