Closed Forsworns closed 3 months ago
Hi, thank u for your interest.
For your problem, Your GPU Compute Capability is 7.5, which our precompiled docker only support 8.0 and higher. If you want to run ktransformers, you may follow the readme Or you can download source code and compile:
section.
One more thing is that our default linear operator(marlin) only support Nvidia's Ampere
and higher architecture. For T4, which is Turing
, you may need modify /ktransformer/optimize/optimize_rules/DeepSeek-V2-Chat.yaml
and change linear backend to KLinearTorch
.
The modified linear replace rule will like:
- match:
name: "^model\\.layers\\.(?!.*self_attn).*$" # regular expression
class: torch.nn.Linear # only match modules matching name and class simultaneously
replace:
class: ktransformers.operators.linear.KTransformersLinear # optimized Kernel on quantized data types
kwargs:
generate_device: "cuda"
generate_op: "KLinearTorch"
Thanks! I have run the original container image successfully with A10 cards :)
I tried the DeepSeek-V2-Lite in your container image
approachingai/ktransformers:0.1.1
, which is deployed on an Nvidia T4 GPU with driver 535.161.8. And I got following errors, is T4 not sufficient to run it, or should I change my driver version?