Closed YungBricoCoop closed 1 year ago
Seeing the same with M1 Max, Ventura 13.5
Same, segmentation fault with the offline chat funcionality:
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
1) Open chat 2) Send any message 3) The bot would reply with just "🤔" 4) Nothing else happens, you can wait indefinitely 5) Send a second message 6) Segmentation Fault
It seems like it hasn't finished processed the first request, and upon the second some semaphores weren't released. I use Khoj with offline markdown files (~3k entries).
Apple M1 Pro, Ventura 13.4.1
P.S. Okay now I got it. The reply takes longer than 44s, so I just had to wait longer.
Not sure if its the same issue but I see a crash too on M1 Mac; looks like this thread:
Thread 17 Crashed:
0 libllamamodel-mainline-default.dylib 0x4d40388d0 ggml_compute_forward + 13820
1 libllamamodel-mainline-default.dylib 0x4d4034dfc ggml_graph_compute + 2036
2 libllamamodel-mainline-default.dylib 0x4d401e450 llama_eval_internal(llama_context&, int const, int, int, int, char const) + 2444
3 libllamamodel-mainline-default.dylib 0x4d401da3c llama_eval + 28
4 libllamamodel-mainline-default.dylib 0x4d400f860 LLamaModel::evalTokens(LLModel::PromptContext&, std::1::vector<int, std::1::allocator
Ah! I finally had a reproduction of this error. I really appreciate the detailed responses in this thread that helped me root cause it. Exactly that -- it happens when you've sent a second query to the LLM when the first one is still being processed.
I'm releasing a bunch of perf improvements to offline chat #393 that should make response faster/more reliable, I hope. But, they will also make it clearer that Llama/Khoj is still processing the request.
Ideally, there should be some way to determine whether the model is occupied. I'll investigate that.
Issue Description:
When using the chat functionality in offline mode on a MacBook Air with an M2 chip, the backend crashes entirely due to a segmentation fault. This problem appears to be specific to the offline chat feature, as the search functionality remains unaffected and operates as expected.
Additional Details:
System: MacBook Air (M2 Chip), Ventura 13.4.1 (22F82) Software: Python 3.11.4.
Terminal output: