llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens"

wazoox commented 5 months ago

the bug:

Going to settings, Local Docs, pointing to a folder containing a few PDFs; when clicking "Add" GPT4All crashes. It then crashes at startup until I delete the localdocs_v1.db file.

"Local Docs" used to work on this machine with GPT4All 2.7.3.

GPT4All works fine if I reset all settings but don't set up any Local Docs.

If I set up an empty folder as a Local docs, it works; however as soon as I drop a PDF into this folder, GPT4All crashes.

After restart, GPT4All crashes once if there's any new PDF in the Local Docs, then run the second time. However if I ask specific questions related to the Local Docs, it doesn't seem to use them.

configuration:

Running GPT4All 2.8.0 on MacOS Monterey 12.7.5 (Mac Pro Intel 32 GB RAM).

installed local models

Meta-Llama-3-8B-Instruct.Q4_0.gguf
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf all-MiniLM-L6-v2-f16.gguf all-MiniLM-L6-v2.gguf2.f16.gguf mistral-7b-instruct-v0.1.Q4_0.gguf mistral-7b-openorca.Q4_0.gguf mistral-7b-openorca.gguf2.Q4_0.gguf

debugging

Unfortunately the first few crashes opened a debugging window with traces, but it doesn't happen anymore for some reason.

chrisbarrera commented 5 months ago

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem). Crashed Thread: 6 embedding

Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Termination Reason: Namespace SIGNAL, Code 6 Abort trap: 6

Some stack trace context - let me know if you want the complete one. Thread 6 Crashed:: embedding 0 libsystem_kernel.dylib 0x19d966a60 pthread_kill + 8 1 libsystem_pthread.dylib 0x19d99ec20 pthread_kill + 288 2 libsystem_c.dylib 0x19d8aba20 abort + 180 3 libllamamodel-mainline-metal.dylib 0x118d8280c llama_decode.cold.3 + 88 4 libllamamodel-mainline-metal.dylib 0x118cd9e28 llama_decode + 8480 5 libllamamodel-mainline-metal.dylib 0x118c32f50 LLamaModel::embedInternal(std::1::vector<

wazoox commented 5 months ago

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem). Crashed Thread: 6 embedding

Can I try something to help? I don't know why the crash report windows doesn't open anymore...

chrisbarrera commented 5 months ago

(wazoox) sorry I was directing my comment to the devs. However, I am sure anyway you can help would be appreciated. (for the devs) continued to look at this, appears to be calling abort in GGML_ASSET in llama_decode_insternal.

Right before it crashed it logged this to file: [Warning] (Fri May 24 14:06:04 2024): Populating font family aliases took 45 ms. Replace uses of missing font family "MyCustomFont, Sans-serif" with one that exists to avoid this cost. [Warning] (Fri May 24 14:06:04 2024): ERROR: could not load hnswlib index: Index seems to be corrupted or unsupported [Warning] (Fri May 24 14:06:04 2024): ERROR: Could not load embeddings [Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-09da435f-1b9d-46b5-8a80-0a3eaa5b8c14.chat" [Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat" [Warning] (Fri May 24 14:06:04 2024): ERROR: Couldn't deserialize chat from file: "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat" [Debug] (Fri May 24 14:06:04 2024): deserializing chats took: 0 ms

Not sure the relationship of he above but may be helpful.

dianamJLAB commented 5 months ago

If this adds any context, also on macOS. Monterey 12.6.9 M1 Max. Running on CPU. GPT4All v 2.8.0 Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf

When I use localdocs.

If I use a prompt that deliberately would not have similarity to any of the content in the PDFs in the localdocs all works as expected.

If I use a prompt that matches content in the localdocs GPT4All crashes.

Crash Thread 8 Exception Type: EXC_BAD_ACCESS

... Thread 8 Crashed:: e0cd5225-60e7-462c-b112-eabcf338d216 0 ??? 0x0 ??? 1 libllamamodel-mainline-cpu.dylib 0x1156b6728 ggml_graph_compute_thread + 896 2 libllamamodel-mainline-cpu.dylib 0x1156b6314 ggml_graph_compute + 248 3 libllamamodel-mainline-cpu.dylib 0x1156e5924 ggml_backend_cpu_graph_compute + 112 4 libllamamodel-mainline-cpu.dylib 0x1156e4a6c ggml_backend_sched_graph_compute_async + 788 5 libllamamodel-mainline-cpu.dylib 0x11572d9d4 llama_decode + 5676 6 libllamamodel-mainline-cpu.dylib 0x11568d7d8 LLamaModel::evalTokens(LLModel::PromptContext&, std::1::vector<int, std::1::allocator > const&) const + 264 7 libllamamodel-mainline-cpu.dylib 0x115696bec LLModel::decodePrompt(std::1::function<bool (int)>, std::1::function<bool (int, std::1::basic_string<char, std::__1::char_traits, std::1::allocator > const&)>, std::1::function<bool (bool)>, LLModel::PromptContext&, std::1::vector<int, std::1::allocator >) + 876 8 libllamamodel-mainline-cpu.dylib 0x1156956f8 LLModel::prompt(std::1::basic_string<char, std::1::char_traits, std::1::allocator > const&, std::1::basic_string<char, std::1::char_traits, std::1::allocator > const&, std::1::function<bool (int)>, std::1::function<bool (int, std::1::basic_string<char, std::1::char_traits, std::1::allocator > const&)>, std::1::function<bool (bool)>, LLModel::PromptContext&, bool, std::__1::basic_string<char, std::1::char_traits, std::__1::allocator >) + 3920 9 gpt4all 0x102957b70 ChatLLM::promptInternal(QList const&, QString const&, QString const&, int, int, float, float, float, int, float, int) + 1032 10 gpt4all 0x102957010 ChatLLM::prompt(QList const&, QString const&) + 276 11 QtCore 0x10634a9e0 QObject::event(QEvent) + 612 12 QtCore 0x106309298 QCoreApplicationPrivate::notify_helper(QObject, QEvent) + 384 13 QtCore 0x106308e18 QCoreApplication::notifyInternal2(QObject, QEvent) + 292 14 QtCore 0x10630a0c8 QCoreApplicationPrivate::sendPostedEvents(QObject, int, QThreadData) + 1428 15 QtCore 0x106474158 QEventDispatcherUNIX::processEvents(QFlags) + 84 16 QtCore 0x10631241c QEventLoop::exec(QFlags) + 532 17 QtCore 0x1063fbb60 QThread::exec() + 280 18 QtCore 0x106478334 0x1062a0000 + 1934132 19 libsystem_pthread.dylib 0x1aace826c _pthread_start + 148 20 libsystem_pthread.dylib 0x1aace308c thread_start + 8

dianamJLAB commented 5 months ago

UPDATE: I installed version 2.7.3 macOS and repeated the steps: same model Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf, same localdocs directory, same SBERT model for localdocs. And GPT4All does NOT crash. Seems to be working as expected. This does appear to be a v 2.8.0 issue. (although I have not yet tried with v 2.7.4)

cebtenzzre commented 5 months ago

I found an issue with embedInternal that could be related, but since I haven't seen this particular crash I'm not sure that it's the same.

@chrisbarrera Could you run GPT4All from a terminal (/Applications/gpt4all/bin/gpt4all.app/Contents/MacOS/gpt4all) and post the output before the crash? GGML_ASSERT prints an error to stderr that should help narrow down what is going on.

Here are the steps I followed to try and replicate the issue on macOS Sonoma 14.4.1:

Move/rename ~/.config/gpt4all.io and ~/Library/Application Support/nomic.ai
Install GPT4All v2.8.0 from https://gpt4all.io/ (online installer)
Download SBert from the models page
Download this pdf to ~/localdocs
Go to Settings > LocalDocs, set new collection name to localdocs and the path to /Users/jared/localdocs
Click "Add"

After this I also tried:

Download Llama 3 8B Instruct
DB icon > enable the collection "localdocs"
Ask "What is Nomic Embed?"

For me, GPT4All does not crash. What are you doing differently?

chrisbarrera commented 5 months ago

The problem went away when I removed localdocs* db under 2.8.0 and recreated the localdocs dbs, but I could replicate it by removing it again and rerunning under 2.7.5 to recreate the DB's, then switching back to 2.8.0.

Here is the output generated as I recreate the crash by attempting to add a new collection to localdocs under 2.8.0: embedInternal: warning: chunking tokenized text at index 0 into zero tokens GGML_ASSERT: /Users/atreat/dev/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens" zsh: abort ./gpt4all

cebtenzzre commented 5 months ago

I can reproduce the assertion failure from the python bindings:

>>> from gpt4all import Embed4All
>>> x = Embed4All('nomic-embed-text-v1.f16.gguf')
>>> x.embed('a ' * 513)
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens

Looks like we are not correctly setting n_ubatch after the llama.cpp update from the CUDA PR.

nomic-ai / gpt4all