why cuda arch should >= 8.0?

yyccli commented 4 months ago

Thanks for this nice work in serving multi loras. The problem is as the title says : )

yzh119 commented 4 months ago

Because the kernel uses some PTX instructions that are available for sm80 or later architectures. Supporting earlier architectures such as sm70/sm75 is feasible (just replacing these instructions with their slower equivalence), but will take some time to implement.

yyccli commented 4 months ago

Thanks for your patience in replying. I'm really new to CUDA now, but i need to try to support punica kernels in sm70 and sm75 archs. I'm using the kernels from vllm, it seems that the kernels in vllm is just bgmv with some minor modifications. In my own project, when i set TORCH_CUDA_ARCH_LIST to 8.0, it works all fine. When setting to 7.0 7.5, i get lots of errors, but they should fall into two categories:

one says that for the bfloat16 type, there is no overloaded operator for the += operation

another says that the identifier make_bfloat162 is undefined I check that both operators should be supported by cuda_bf16.h, and i search that bfloat16 hardware acceleration only exists for sm80+ archs?(am i right?) so should i just use float in sm70/75 and disable bfloat16? or can you teach how can i find these PTX instructions and replace them?

yzh119 commented 4 months ago

Firstly, we are working on unifying LoRA kernels in punica to FlashInfer (in our release v0.1.0 checklist) where we plan to support sm70/sm75.

Regarding your question, native bfloat16 support is only available for sm80 or later architectures, otherwise you can only use software simulation.

can you teach how can i find these PTX instructions and replace them?

You can check PTX documentation and check the Target ISA Notes of each instruction.

yyccli commented 4 months ago

Thanks again for your reply : )

zhochengbiao commented 3 months ago

e are working on unifying LoRA kernels in punica to FlashInfer (in our release v0.1.0 checklist) whe I have also encountered this issue. May I ask when this support can be provided

punica-ai / punica

why cuda arch should >= 8.0? #42