sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.66k stars 448 forks source link

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

Open a136214808 opened 2 weeks ago

a136214808 commented 2 weeks ago

Checklist

Describe the bug

My environment is A100*8 and cuda version is 118, and when I install the sglang in order, I can't run it smoothly. Because I am not the owner of the server, so I can't change the cuda environment. So, I want to know whether there is special installation requirements for cu118.(I try two servers and they both fail)

My orders are as follows: pip install --upgrade pip pip install "sglang[all]"

Install FlashInfer CUDA kernels

pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.4/

Reproduction

command: CUDA_VISIBLE_DEVICES=3 python -m sglang.launch_server --model-path /disk1/qwen2.5/Qwen2.5-7B-Instruct --port 30000 --enable-torch-compile --attention-backend triton --sampling-backend pytorch

bug: ...... File "/disk1/young/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 468, in init_cuda_graphs self.cuda_graph_runner = CudaGraphRunner(self) File "/disk1/young/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 153, in init raise Exception( Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid Possible solutions:

  1. disable cuda graph by --disable-cuda-graph
  2. set --mem-fraction-static to a smaller value (e.g., 0.8 or 0.7)
  3. disable torch compile by not using --enable-torch-compile Open an issue on GitHub https://github.com/sgl-project/sglang/issues/new/choose

Environment

Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3,4,5: NVIDIA A100 80GB PCIe GPU 0,1,2,3,4,5 Compute Capability: 8.0 CUDA_HOME: /usr/local/cuda-11.8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 CUDA Driver Version: 515.105.01 PyTorch: 2.4.0+cu118 sglang: 0.3.2 flashinfer: 0.1.6+cu118torch2.4 triton: 3.0.0 transformers: 4.45.1 requests: 2.32.3 tqdm: 4.66.5 numpy: 1.26.4 aiohttp: 3.10.8 fastapi: 0.115.0 hf_transfer: 0.1.8 huggingface_hub: 0.25.1 interegular: 0.3.3 packaging: 24.1 PIL: 10.4.0 psutil: 6.0.0 pydantic: 2.9.2 uvicorn: 0.31.0 uvloop: 0.20.0 zmq: 26.2.0 vllm: 0.5.5 multipart: 0.0.12 openai: 1.51.0 anthropic: 0.34.2 NVIDIA Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 CPU Affinity NUMA Affinity GPU0 X PIX PXB PXB PXB PXB 0-15,32-47 0 GPU1 PIX X PXB PXB PXB PXB 0-15,32-47 0 GPU2 PXB PXB X PXB PXB PXB 0-15,32-47 0 GPU3 PXB PXB PXB X PXB PXB 0-15,32-47 0 GPU4 PXB PXB PXB PXB X PXB 0-15,32-47 0 GPU5 PXB PXB PXB PXB PXB X 0-15,32-47 0

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 1048576

Razerl commented 3 days ago

Have you solved it? @a136214808

a136214808 commented 3 days ago

Have you solved it? @a136214808

I haven't tried recently, maybe the answer is no.