Closed yyccli closed 4 months ago
Because the kernel uses some PTX instructions that are available for sm80
or later architectures. Supporting earlier architectures such as sm70
/sm75
is feasible (just replacing these instructions with their slower equivalence), but will take some time to implement.
Thanks for your patience in replying. I'm really new to CUDA now, but i need to try to support punica kernels in sm70 and sm75 archs.
I'm using the kernels from vllm, it seems that the kernels in vllm is just bgmv
with some minor modifications.
In my own project, when i set TORCH_CUDA_ARCH_LIST
to 8.0
, it works all fine. When setting to 7.0 7.5
, i get lots of errors, but they should fall into two categories:
one says that for the bfloat16
type, there is no overloaded operator for the +=
operation
another says that the identifier make_bfloat162
is undefined
I check that both operators should be supported by cuda_bf16.h
, and i search that bfloat16
hardware acceleration only exists for sm80+ archs?(am i right?)
so should i just use float
in sm70/75
and disable bfloat16
? or can you teach how can i find these PTX instructions
and replace them?
Firstly, we are working on unifying LoRA kernels in punica to FlashInfer (in our release v0.1.0 checklist) where we plan to support sm70
/sm75
.
Regarding your question, native bfloat16
support is only available for sm80
or later architectures, otherwise you can only use software simulation.
can you teach how can i find these PTX instructions and replace them?
You can check PTX documentation and check the Target ISA Notes
of each instruction.
Thanks again for your reply : )
e are working on unifying LoRA kernels in punica to FlashInfer (in our release v0.1.0 checklist) whe I have also encountered this issue. May I ask when this support can be provided
Thanks for this nice work in serving multi loras. The problem is as the title says : )