Closed PGTBoos closed 9 months ago
For context you need a lot of VRAM too. P.S I usually keep for context ~3-4GB of VRAM (if it less then 8k)
Agree my system 32gb ram (and pagefile) it might get slow but shouldn't crash. The videocard has 12gb ram it should be enough. The error also suggest there are options for pytorch to handle memory better. Though my knowledge about programming these kind of LLM's using pytorch is limited. It might be there is something new with recent versions it doesn't make use off. As these crashes keep a model working for a while but eventually crash
I'm getting this same error. I only have 8GB VRAM which I presume isn't enough to load much of the model, but I was hoping it would at least run slowly via system RAM (32GB) and CPU assist...
@phr00t if you want it slow, just install the latest nvidia driver and don't use this recommendation https://github.com/oobabooga/text-generation-webui/discussions/4484 , or if you do, revert it to defaults.
But keep in mind, you will lose so much performance that there is simply no point in using the GPU and it is better to switch to just the CPU.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
in the console you get this error torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 10.47 GiB is allocated by PyTorch, and 759.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The strange thing is though that for a while a conversation works and then this apears. Also notice the unallocated space, i dont know enough of pytorch to fix it. I only run 7B models wit the 4bit and 8bit settings they should fit (well they do upon start but eventually i get this error). This with every model, it doesnt matter which one.
Is there an existing issue for this?
Reproduction
Well I just talk for a while and it randomly happens its not after x characters or x conversations it can happen after the seccond reply or after the 20 reply.
Screenshot
the chat result in empty replies (falls in repeat of empty).
Logs
System Info