punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

RunTimeError: output must be a cuda tensor #49

Open iskander-sauma-assessio opened 2 months ago

iskander-sauma-assessio commented 2 months ago

Hi!

I tried using the benchmark text generation

python -m benchmarks.bench_textgen_lora --system punica --batch-size 32

but when I did I got a runtime error stating the output should be a cuda tensor. I am not sure if this error is from my side or if the code is from the code. This is the error I am given Screenshot 2024-04-03 at 11 35 15

The cuda version I use is 12.4, python version 3.10.12, ninja version 1.11.1.git.kitware.jobserver-1, torch version 2.2.2.

danjuan-77 commented 1 month ago

me too ! how to solve it??? HELP!!

MichaelYuan2 commented 2 weeks ago

I got the same issue here