Program aborts from invalid pointer

whoabuddy commented 8 months ago

Describe the bug

I'm running into an issue after pulling the latest 0f134bf7 and running pip install -U -r requirements.txt.

I'm running the server with this command: python server.py --model-dir /media/ash/AI-Vault-1/ai-models --api --verbose --listen

I then load the model with the following settings:

Model: miqu-1-70b.q4_k_m.gguf
n-gpu-layers: 56
n-ctx: 32764
tensor_split: 23,23
threads: 16

It loads successfully across my two GPUs. Once loaded, I'm accessing the OpenAI API through a CrewAI python script, which iterates over an objective using Langchain on the backend. It makes lots of calls and seemed to start out just fine but after a while I get a cryptic error message:

Llama.generate: prefix-match hit
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:7863: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

I'm not sure how to troubleshoot from here but can share more info if needed!

Is there an existing issue for this?

possibly #4987
[X] I have searched the existing issues

Reproduction

See description.

Screenshot

Output included in description.

Logs

Llama.generate: prefix-match hit
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:7863: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

System Info

OS: Linux Mint (Ubuntu-based)
GPU: 2x RTX 4090

whoabuddy commented 8 months ago

~~Interesting, per the related issue tried a fresh conda env, clone, and install. Things seem to be back to normal :crossed_fingers:~~

Nope it's back:

Llama.generate: prefix-match hit
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:7863: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

whoabuddy commented 8 months ago

Is this just telling me OOM? Is there a good way to troubleshoot? Running latest from main:

CUDA error: invalid argument
  current device: 1, in function ggml_backend_cuda_buffer_get_tensor at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:10759
  cudaMemcpy(data, (const char *)tensor->data + offset, size, cudaMemcpyDeviceToHost)
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:241: !"CUDA error"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Current model settings:

nvidia-smi output before running:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
|  0%   32C    P8              20W / 450W |     11MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:02:00.0 Off |                  Off |
|  0%   26C    P8              22W / 450W |     11MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1655      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      1655      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

nvidia-smi output while running:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
|  0%   31C    P8              19W / 450W |  19237MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:02:00.0 Off |                  Off |
|  0%   25C    P8              23W / 450W |  19649MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1655      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A    323236      C   python                                    19204MiB |
|    1   N/A  N/A      1655      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A    323236      C   python                                    19616MiB |
+---------------------------------------------------------------------------------------+

Using watch -n 5 nvidia-smi I see the memory usage jump up slightly, then clear when it hits the abort.

github-actions[bot] commented 6 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui