[x] I have searched to see if a similar issue already exists.
I am trying to build an efficient long-context LLMs inference demo in HF Demo using both ZeroGPU or general GPUs like L4. Unfortunately, I can't install PyCUDA, which is essential for my project. Is it possible to get support for this?
I am trying to build an efficient long-context LLMs inference demo in HF Demo using both ZeroGPU or general GPUs like L4. Unfortunately, I can't install PyCUDA, which is essential for my project. Is it possible to get support for this?