Open ksajan opened 2 weeks ago
Hi @ksajan π
Thanks for filing the issue. I think the problem is that you're running on a CPU and the falcon-7b in TGI is only supported with kernels that require a GPU.
If you want to run TGI locally on cpu to test I'd recommend choosing a smaller model that doesn't rely on special kernel. Or if you're requirements are to use something like falcon-7b
then unfortunately you'll need a GPU machine.
Let me know if I can help in any other way π
@ErikKaum I tried running this lmsys/vicuna-7b-v1.3
as well which I can run using llama_cpp. I was trying to actually train the Medusa head that is there in the TGI documentation but I was unable to run this in google collab with GPU with a similar error.
Yeah so the llama.cpp version probably uses different kernels that don't require GPUs.
When you build this for a gpu did you use: BUILD_EXTENSIONS=True make install-cpu
or BUILD_EXTENSIONS=True make`?
I'd nonetheless recommend using the docker image to avoid building from source, usually a lot more hassle free π
System Info
text-generation-launcher --env:
No GPU using CPU version.
Information
Tasks
Reproduction
Could not import SGMV kernel from Punica, falling back to loop.
Expected behavior
It should download the model and serve it without any error