Open Mayorc1978 opened 1 month ago
Hi @Mayorc1978 , thank you very much for your interest in QServe! Although it is targeted for large-scale LLM serving, QServe can also work on consumer GPUs like RTX 4090 and 3090. For RTX 4090, you can expect a similar speedup over TensorRT-LLM as on L40S. We did not do many experiments on 3060 or 3090, but we believe that the principles will still hold.
Hi, how about Tesla T4 and RTX2080Ti?
Hi @tp-nan , Tesla T4 and RTX2080 are not supported in QServe right now. Currently, we have some instructions that can only be compiled with Ampere+ architecture. We will consider support older GPUs after cleaning the cuda code. Thank you!
As per title. Example: with GPUs like 3060 12GB or 3090 24GB.