Closed horiacristescu closed 1 month ago
Hi @horiacristescu This might be expected, although FlashInfer supports sm70 sm75, it mainly focuses on sm80+. We currently also recommend when using architectures below sm80, enabling the --disable-flashinfer --disable-flashinfer-sampling
parameters. We apologize for any inconvenience caused and thank you for your understanding. cc @yzh119
启用这些--disable-flashinfer --disable-flashinfer-sampling参数,是直接 python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 30000 --disable-flashinfer True --disable-flashinfer-sampling True这样设置吗
python3 -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 30000 --disable-flashinfer --disable-flashinfer-sampling
不行,报错,No module named flashinfer,因为服务器不支持所以我确实没安装flashinfer
FlashInfer is one of the important dependencies of SGLang, you must install it to use. Disabling FlashInfer merely means not using it. It doesn't mean you can avoid installing FlashInfer.
FlashInfer supports
Python: 3.8, 3.9, 3.10, 3.11
PyTorch: 2.2/2.3/2.4 with CUDA 11.8/12.1/12.4 (only for torch 2.4)
Currently, SGLang already supports sm75, such as T4. Welcome to try the latest version. We currently do not have plans to support sm70. Thanks!
Checklist
Describe the bug
Can't use sglang with flashinfer if you have sm_75 or lower. Not even recompiling. Better put up this information so people don't waste time trying to make it work.
Reproduction
simply trying to use it without
--disable-flashinfer --disable-flashinfer-sampling
causes a crashEnvironment