Closed Mradr closed 1 year ago
ctransformers
0.2.10
using a new fork of llama.cpp cmp-nct/ggllm.cppthreads=1
to avoid the overhead of creating CPU threads.
When comparing speed, use a fixed seed (e.g. llm(..., seed=100)
) so that it generates the same amount of text every time.
Some other processes running on your system might also be causing the performance to vary by consuming more/less memory.Thanks! The fix seem to be increasing the batch_size from default to 256. I was able to set t=1 then and get more performance. 7-8 or 10-12 having more thread seems to improve performance than degrade it. I'm also not getting an out of memory error - it just "ooms" or stops trying to use the GPU and starts slowing down after a bit more testing and playing with the numbers. In this case, it still takes up the VRAM - but seems like it just doesnt do anything with the GPU at that point.
1) I just wanted to ask if you guys are planing to add MPT GPU support as well somet ime? I see its supported for LLAMA models. 2) Real reason for the ticket, I am having issue getting it to really use the GPU. Sometimes it works and sometimes it doenst. Not really sure how to explain, but: A) Windows 11, 32 GB of RAM, Ryzen 5800x, 13B-HyperMantis.ggmlv3.q5_1.bin, set to LLAMA, 12 Threads (give or take what I set here for setting reasons), RTX 2080 B) Install the model the other day. Tested on the CPU was able to get results back under 20 to 25 seconds. Saw there was GPU suported, uninstall and reinstall with CUDA support. Tested the GPU off loading and it didnt seem to do much in my first round of testing. I had set it to 25 layers at the time. Didnt see any improvement in speed, but could see that the GPU was being used with higher memory access and GPU usage spiking, but never capping at max. Lower the count to 15 layers. Tested again. This time was able to hit 5 to 10. Went crazy and tested it as much as I could getting really good results. Today, I rebooted my machine and its acting like it did the other day at 25. Tried lower it from 15 to 10 or below ... but it doesnt seem to be using the GPU yet "acting" or "setting" up the usage for the GPU as I can see the memory and inflex of usage - but never fully topping out.
I could be totally using it wrong, but the fact it was working the other day and today it stop tells me something change on my computer, but I honstly couldnt tell you what did. Didnt perform any updates, but its also weird it didnt work before then all at once it did. Not sure if there is some type of support model it needs or not. Cuda is supported under torch check. Any help or inforamtion is welcome:) I understand this is not a common issue. Any places I can check/get values for to see if its really working would be great. Just seems odd.