[Usage]: How to disable multi lora to avoid using punica ? Or is the punica being the only choice?

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

26.61k stars 3.9k forks source link

[Usage]: How to disable multi lora to avoid using punica ? Or is the punica being the only choice? #4434

Open laoda513 opened 4 months ago

laoda513 commented 4 months ago

I searched the outdated issues, and everyone is saying that the version of multi lora's punica must be >=8.0. Therefore, I want to ask if there is an option that only uses a standalone lora but supports cuda=7.5?

I have tried examples/offline_inference.py, which llm.generate with only 1 lora. But it still ran into punica, and then it prompted that cuda>=8 is required.

robertgshaw2-neuralmagic commented 4 months ago

If you only have one lora adapter, simply merge the adapter back into your model and you can use it directly

laoda513 commented 4 months ago

Thanks！

hmmmm, although， that sound complicate If I want to save multi lora copies 。。。else it would take too much dick space。。

yyccli commented 4 months ago

if your model is not fine-tuned with bfloat16 type, then you can just compile float16 type kernels and float16 kernels support sm>=75

laoda513 commented 4 months ago

if your model is not fine-tuned with bfloat16 type, then you can just compile float16 type kernels and float16 kernels support sm>=75

thanks, how can I ask how to complie float16 kernels? Did not find it in docs.

yyccli commented 4 months ago

you need to comment out some operations related to bf16 in vec_dtypes.cuh and punica_ops.cc
modify the CMakeLists.txt file to allow sm75 flag

laoda513 commented 4 months ago

sounds

you need to comment out some operations related to bf16 in vec_dtypes.cuh and punica_ops.cc

modify the CMakeLists.txt file to allow sm75 flag

you need to comment out some operations related to bf16 in vec_dtypes.cuh and punica_ops.cc

modify the CMakeLists.txt file to allow sm75 flag

OK. thank you! Quite challenge to me, I will take a try.