kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741 stars 39 forks source link

CUDA error: No kernel image is available for execution on the device #44

Closed Forsworns closed 3 months ago

Forsworns commented 3 months ago

I tried the DeepSeek-V2-Lite in your container image approachingai/ktransformers:0.1.1, which is deployed on an Nvidia T4 GPU with driver 535.161.8. And I got following errors, is T4 not sufficient to run it, or should I change my driver version?

image

Azure-Tang commented 3 months ago

Hi, thank u for your interest. For your problem, Your GPU Compute Capability is 7.5, which our precompiled docker only support 8.0 and higher. If you want to run ktransformers, you may follow the readme Or you can download source code and compile: section. One more thing is that our default linear operator(marlin) only support Nvidia's Ampere and higher architecture. For T4, which is Turing, you may need modify /ktransformer/optimize/optimize_rules/DeepSeek-V2-Chat.yaml and change linear backend to KLinearTorch .

The modified linear replace rule will like:

- match:
    name: "^model\\.layers\\.(?!.*self_attn).*$"  # regular expression 
    class: torch.nn.Linear  # only match modules matching name and class simultaneously
  replace:
    class: ktransformers.operators.linear.KTransformersLinear  # optimized Kernel on quantized data types
    kwargs:
      generate_device: "cuda"
      generate_op: "KLinearTorch"
Forsworns commented 3 months ago

Thanks! I have run the original container image successfully with A10 cards :)