kuvaus / LlamaGPTJ-chat

Simple chat program for LLaMa, GPT-J, and MPT models.
MIT License
216 stars 50 forks source link

Not enough space in the context's memory pool (needed 865484240, available 863773936 #4

Closed chuckbeasley closed 1 year ago

chuckbeasley commented 1 year ago

I'm attempting to use ggml-mpt-7b-chat.bin to summarize the following article and am consistently getting the error:

Jamie Komoroski’s blood alcohol level was over three times the legal limit when she allegedly drove her car into a golf-cart style vehicle carrying a newly married couple away from their wedding reception last month, killing the bride, according to a South Carolina Law Enforcement Division toxicology report.In the report shared with CNN by the Folly Beach Police Department, Komoroski, 25, was found to have had a blood alcohol content of 0.261%. South Carolina law prohibits driving with a blood alcohol content of 0.08% or higher.The bride, Samantha Hutchinson, 34, from Charlotte, North Carolina, died of blunt force injuries according to the Charleston County Coroner’s Office. Her husband, Aric Hutchinson, and two others were also injured in the crash.Komoroski is charged with one count of reckless homicide and three counts of felony DUI resulting in great bodily harm, according to online court records. Her vehicle was traveling 65 mph in a 25-mph zone, according to Police Chief Andrew Gilreath.Komoroski refused a field sobriety test after the incident on April 28 and a warrant was issued for her blood to be taken for testing, according to an affidavit.“We cannot fathom what the families are going through and offer our deepest sympathies. We simply ask that there not be a rush to judgment. Our court system is founded upon principles of justice and mercy and that is where all facts will come to light,” Christopher Gramiccioni, an attorney for Komoroski, told CNN in a statement. Please summarize the article.

Here are the software details: LlamaGPTJ-chat (v. 0.1.8) LlamaGPTJ-chat: parsing options from json: chat.json LlamaGPTJ-chat: loading ggml-mpt-7b-chat.bin mpt_model_load: loading model from 'ggml-mpt-7b-chat.bin' - please wait ... mpt_model_load: n_vocab = 50432 mpt_model_load: n_ctx = 2048 mpt_model_load: n_embd = 4096 mpt_model_load: n_head = 32 mpt_model_load: n_layer = 32 mpt_model_load: alibi_bias_max = 8.000000 mpt_model_load: clip_qkv = 0.000000 mpt_model_load: ftype = 2 mpt_model_load: ggml ctx size = 5653.09 MB mpt_model_load: kv self size = 1024.00 MB mpt_model_load: ................................ done mpt_model_load: model size = 4629.02 MB / num tensors = 194 LlamaGPTJ-chat: done loading!

chat.json: { "top_p": 0.9, "top_k": 50432, "temp": 0.3, "n_batch": 56, "model": "ggml-mpt-7b-chat.bin", "threads": 4, "n_predict": 64, "n_ctx": 512 }

The hardware specifications (Windows 11): Processor AMD Ryzen 7 1700X Eight-Core Processor 3.40 GHz Installed RAM 64.0 GB System type 64-bit operating system, x64-based processor

Is there something I don't have configured correctly?

kuvaus commented 1 year ago

This is a good catch!

It seems to be a bug. Looks to me that you got everything configured correctly.

So I tried the same and also got a memory pool size error. If I reduce the text by around 50% then it works. Also with GPTJ groovy it works. But MPT should be able to handle texts of that size so there's clearly something wrong with the program...

This is kinda important bug to fix so I'll try to get it working. I just need to figure out what is causing the error.

Thanks for finding this bug.

kuvaus commented 1 year ago

Small update:

There's a chance it could be related to this. https://github.com/ggerganov/ggml/pull/145. The ggml folks are usually very fast with their progress. So I think the best course for action is to wait for the backend update. If it's still happening, then I need to figure out if I'm allocating the context wrong somewhere else. This is also very possible.

I'll update this issue if I get this fixed.

chuckbeasley commented 1 year ago

Thank you for the quick follow-up! I wasn't expecting that.

I've noticed that, while the ggml-gpt4all-l13b-snoozy model works, it is very slow almost to the point of being unusable for anything other than testing and there's not yet an option to use GPUs. That why I switched over to testing the ggml-mpt-7b-* models. They are much faster and can be used commercially. I'm hoping someone will release the story-teller model. That one is a beast.

kuvaus commented 1 year ago

The ggml developers were indeed fast. There is already a temporary solution:

    if (mem_per_token > 0 && mem_per_token * N > 0.9*buf_size) {
        const size_t buf_size_new = 1.1 * (mem_per_token * N); // add 10% to account for ggml object overhead

This only works when --batch_size 1.

v0.1.9 now includes that fix so it should work for MPT when --batch_size 1.

Will update more when there will be a permanent solution for all batch sizes but this was so important to get working that I wanted to release v.0.1.9 as soon as it did not always crash.

kuvaus commented 1 year ago

I heard that the snoozy name was because it was so slow. :) And yeah. The MPT models are indeed awesome! Really fast because of only 7B parameters and seem to be really accurate too.

kuvaus commented 1 year ago

v0.2.0 comes with big changes:

Also updates during past few versions:

Big thanks to everyone so far! You have been hugely helpful. :)

chuckbeasley commented 1 year ago

Awesome! Thank you for your hard work.

chuckbeasley commented 1 year ago

With v0.2, does batch_size still need to be set to 1?

kuvaus commented 1 year ago

Nope. It should now work with any batch_size. :)

chuckbeasley commented 1 year ago

Cool. I'm going to go ahead and close this issue.