Closed whoabuddy closed 6 months ago
Interesting, per the related issue tried a fresh conda env, clone, and install. Things seem to be back to normal :crossed_fingers:
Nope it's back:
Llama.generate: prefix-match hit
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:7863: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
Is this just telling me OOM? Is there a good way to troubleshoot? Running latest from main:
CUDA error: invalid argument
current device: 1, in function ggml_backend_cuda_buffer_get_tensor at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:10759
cudaMemcpy(data, (const char *)tensor->data + offset, size, cudaMemcpyDeviceToHost)
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:241: !"CUDA error"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
Current model settings:
nvidia-smi output before running:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 32C P8 20W / 450W | 11MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:02:00.0 Off | Off |
| 0% 26C P8 22W / 450W | 11MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1655 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1655 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
nvidia-smi output while running:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 31C P8 19W / 450W | 19237MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:02:00.0 Off | Off |
| 0% 25C P8 23W / 450W | 19649MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1655 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 323236 C python 19204MiB |
| 1 N/A N/A 1655 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 323236 C python 19616MiB |
+---------------------------------------------------------------------------------------+
Using watch -n 5 nvidia-smi
I see the memory usage jump up slightly, then clear when it hits the abort.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
I'm running into an issue after pulling the latest 0f134bf7 and running
pip install -U -r requirements.txt
.I'm running the server with this command:
python server.py --model-dir /media/ash/AI-Vault-1/ai-models --api --verbose --listen
I then load the model with the following settings:
It loads successfully across my two GPUs. Once loaded, I'm accessing the OpenAI API through a CrewAI python script, which iterates over an objective using Langchain on the backend. It makes lots of calls and seemed to start out just fine but after a while I get a cryptic error message:
I'm not sure how to troubleshoot from here but can share more info if needed!
Is there an existing issue for this?
possibly #4987
[X] I have searched the existing issues
Reproduction
See description.
Screenshot
Output included in description.
Logs
System Info