sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
5.9k stars 477 forks source link

How to use inside notebook? #38

Closed nivibilla closed 9 months ago

nivibilla commented 9 months ago

Im trying to use this on databricks inside the notebook that's running on top of a 8xA10 single node cluster, I'm initialising like:

from sglang import function, system, user, assistant, gen, set_default_backend, Runtime
runtime = Runtime("/local_disk0/mistralai/Mixtral-8x7B-Instruct-v0.1")
set_default_backend(runtime)

However I get this issue

router init state: Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 68, in start_router_process
    model_client = ModelRpcClient(server_args, port_args)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 448, in __init__
    self.model_server.exposed_init_model(0, server_args, port_args)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 54, in exposed_init_model
    self.model_runner = ModelRunner(
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 213, in __init__
    torch.cuda.set_device(self.tp_rank)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/torch/cuda/__init__.py", line 404, in set_device
    torch._C._cuda_setDevice(device)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-51dd0ee1-a396-4939-81a6-75e3afe59af5/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
merrymercy commented 9 months ago

Did you import torch or use any GPU-enabled libraries before creating the Runtime?

Here are some things you can try:

  1. create Runtime before you using any other GPU libraries.
  2. try this at the very beginning of your notebook.
    import multiprocessing as mp
    mp.set_start_method('spawn')
  3. see this colab notebook example https://colab.research.google.com/drive/13lOJt8uFYZJetqQIudAlK8oJJX8PENNk?usp=sharing
nivibilla commented 9 months ago

1) still gives me the same error.

2) says that "context already set", if I try force = true, the cell keeps running forever with no output.

3) cell also runs forever.

For clarification, I am able to run the server using the web terminal on databricks and manually starting the server. However that's not ideal as I can't automate the process.

nivibilla commented 9 months ago

Seem to have found a workaround.

import subprocess
server = subprocess.Popen("python -m sglang.launch_server --model-path /local_disk0/mistralai/Mistral-7B-Instruct-v0.2/ --port 30000 --tp 8", shell=True)