marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.76k stars 137 forks source link

Slow + No config options #172

Closed yukiarimo closed 7 months ago

yukiarimo commented 8 months ago

I am using Pygmalion 2 7B 5q quantized gguf model on my MacBook Pro M1 14" with 16GB RAM and 512GB SSD. The model performs well with short sentences, but processing slows down significantly when I input around 512+ tokens, and even can crash in that scenario. However, when using Koboldcpp (Llamacpp based), it generates results quickly. Is there a way to improve the speed, such as utilizing BLAS or adjusting the number of GPU/CPU? Any suggestions?

CerealNotFound commented 8 months ago

I was beginning to feel frustrated about why it wasn't working, but if it's happening with other people as well, then perhaps it's the problem of the library itself 😂😮‍💨

yukiarimo commented 7 months ago

It's working now.