Open laoda513 opened 4 months ago
If you only have one lora adapter, simply merge the adapter back into your model and you can use it directly
Thanks!
hmmmm, although, that sound complicate If I want to save multi lora copies 。。。else it would take too much dick space。。
if your model is not fine-tuned with bfloat16
type, then you can just compile float16
type kernels and float16
kernels support sm>=75
if your model is not fine-tuned with
bfloat16
type, then you can just compilefloat16
type kernels andfloat16
kernels supportsm>=75
thanks, how can I ask how to complie float16 kernels? Did not find it in docs.
bf16
in vec_dtypes.cuh
and punica_ops.cc
CMakeLists.txt
file to allow sm75
flagsounds
you need to comment out some operations related to
bf16
invec_dtypes.cuh
andpunica_ops.cc
modify the
CMakeLists.txt
file to allowsm75
flagyou need to comment out some operations related to
bf16
invec_dtypes.cuh
andpunica_ops.cc
modify the
CMakeLists.txt
file to allowsm75
flag
OK. thank you! Quite challenge to me, I will take a try.
I searched the outdated issues, and everyone is saying that the version of multi lora's punica must be >=8.0. Therefore, I want to ask if there is an option that only uses a standalone lora but supports cuda=7.5?
I have tried examples/offline_inference.py, which llm.generate with only 1 lora. But it still ran into punica, and then it prompted that cuda>=8 is required.