There is CLBlast GPU support for GPT-2 based models on koboldcpp for example, where I can do prompt processing on the GPU VRAM for less prompt batching errors with my 16GB of CPU RAM. Does anyone know if this is possible with ctransformers?
Currently it is doesn't have GPU support for those models as it is based on the examples from ggml which don't have GPU support.
Only LLaMA and Falcon models have GPU (CUDA) support currently.
There is CLBlast GPU support for GPT-2 based models on koboldcpp for example, where I can do prompt processing on the GPU VRAM for less prompt batching errors with my 16GB of CPU RAM. Does anyone know if this is possible with ctransformers?