Closed pangr closed 11 months ago
Well. 1080ti is a bit too old to test at this moment. How about using the Vulkan backend instead of cuda?
Same here with GTX 1060.
How about Vulkan backend? How is it working with NVIDIA GPU?
you can follow the instruction with vulkan for this device in llm.mlc.ai/docs
@tqchen : I'd like to join in on this one. I have an old mining rig with 11x GTX 1060 6GB, so I'd like to use CUDA for parallelization/batching.
You can try to hack and remove that assert from the build python function. Note that the latest cuda runtime likely deprecate sm_61 support , so you might need to work on a older one.
In the mean time, vulkan should be supported out of box.
I do think our CUDA backend supports sm 61 out of box, but you will need to compile it yourself. Use extra flags to disable cutlass: “--no-cutlass-attn” and “--no-cutlass-norm”
We should have a fallback that turns off cutlass automatically for non-sm7x/8x devices
🚀 Feature
when i execute
python3 -m mlc_llm.build --model xxxx --target cuda --quantization q4f16_1
on 1080ti reportAssertionError: sm61 not supported yet.
Motivation
Alternatives
Additional context