ultralytics / ultralytics

Ultralytics YOLO11 🚀
https://docs.ultralytics.com
GNU Affero General Public License v3.0
32.15k stars 6.17k forks source link

Memory issue with NCNN model #16093

Open warhammercasey opened 1 month ago

warhammercasey commented 1 month ago

Search before asking

Ultralytics YOLO Component

Predict

Bug

I'm trying to run inference on a Le Potato (raspberry pi clone) using the yolov8s model exported as an NCNN model but I seem to be running into some kind of memory issues. Running predict with the model causes the entire system to reboot immediately roughly 70% of the time. The other ~30% of the time seems to be a toss up between working properly or getting a segfault.

The model was exported using model = YOLO("yolov8s.pt") and model.export(format='ncnn') so its nothing custom.

To rule out platform/environment issues I tried running the same thing on my desktop machine in WSL2 which appeared to run fine, except after running it roughly 5 times something caused explorer.exe and a few other processes visible in event viewer to crash. Thats possibly a coincidence so after I post this Im going to see if its repeatable but I want to post this bug before I potentially lose my work if my browser also crashes.

If I load the pytorch model alone (I.E model = YOLO("yolov8s.pt") rather than model = YOLO("yolov8s_ncnn_model")) it works perfectly fine indicating it likely has to do with the ncnn model specifically.

Both tests were run in a python 3.11 venv with only ultralytics (and dependencies) installed using pip install ultralytics.

I'm trying to see if I can get any more information on whats causing this and Ill update this thread if I find anything else but given the issue is intermittent crashing on WSL2 and "the entire system resets" on the le potato its a little hard to debug anything specific.

Environment

Output of yolo checks on le potato:

Ultralytics YOLOv8.2.90 🚀 Python-3.11.2 torch-2.4.1 CPU (Cortex-A53)
Setup complete ✅ (4 CPUs, 1.9 GB RAM, 16.0/29.2 GB disk)

OS                  Linux-6.1.74-12781-g74961fb0a5d2-aarch64-with-glibc2.36
Environment         Linux
Python              3.11.2
Install             pip
RAM                 1.90 GB
CPU                 Cortex-A53
CUDA                None

numpy               ✅ 1.23.5<2.0.0,>=1.23.0
matplotlib          ✅ 3.9.2>=3.3.0
opencv-python       ✅ 4.10.0.84>=4.6.0
pillow              ✅ 10.4.0>=7.1.2
pyyaml              ✅ 6.0.2>=5.3.1
requests            ✅ 2.32.3>=2.23.0
scipy               ✅ 1.14.1>=1.4.1
torch               ✅ 2.4.1>=1.8.0
torchvision         ✅ 0.19.1>=0.9.0
tqdm                ✅ 4.66.5>=4.64.0
psutil              ✅ 6.0.0
py-cpuinfo          ✅ 9.0.0
pandas              ✅ 2.2.2>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0
ultralytics-thop    ✅ 2.0.6>=2.0.0
torch               ✅ 2.4.1!=2.4.0,>=1.8.0; sys_platform == "win32"

Output of yolo checks on desktop:

Ultralytics YOLOv8.2.90 🚀 Python-3.11.0rc1 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce RTX 3080, 12288MiB)
Setup complete ✅ (16 CPUs, 29.4 GB RAM, 193.3/368.0 GB disk)

OS                  Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Environment         Linux
Python              3.11.0rc1
Install             pip
RAM                 29.38 GB
CPU                 13th Gen Intel Core(TM) i5-13600KF
CUDA                11.8

numpy               ✅ 1.26.3<2.0.0,>=1.23.0
matplotlib          ✅ 3.9.2>=3.3.0
opencv-python       ✅ 4.10.0.84>=4.6.0
pillow              ✅ 10.2.0>=7.1.2
pyyaml              ✅ 6.0.2>=5.3.1
requests            ✅ 2.32.3>=2.23.0
scipy               ✅ 1.14.1>=1.4.1
torch               ✅ 2.4.1+cu118>=1.8.0
torchvision         ✅ 0.19.1+cu118>=0.9.0
tqdm                ✅ 4.66.5>=4.64.0
psutil              ✅ 6.0.0
py-cpuinfo          ✅ 9.0.0
pandas              ✅ 2.2.2>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0
ultralytics-thop    ✅ 2.0.6>=2.0.0
torch               ✅ 2.4.1+cu118!=2.4.0,>=1.8.0; sys_platform == "win32"

Minimal Reproducible Example

Code on le potato:

from ultralytics import YOLO
import ultralytics, cv2, time
import numpy as np

VIDEO_RES = (1280, 720)

model = YOLO("yolov8s_ncnn_model", task='detect')
#model = YOLO("yolov8n.pt", task='detect')

cap = cv2.VideoCapture(1, cv2.CAP_V4L2)
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
cap.set(cv2.CAP_PROP_FRAME_WIDTH, VIDEO_RES[0])
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, VIDEO_RES[1])
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
if not cap.isOpened():
    print("Cannot open camera!")
    exit()

ret, frame = cap.read()
frame = model(frame)

cap.release()
cv2.destroyAllWindows()

Code on WSL2:

from ultralytics import YOLO
import cv2

model = YOLO("yolov8s_ncnn_model")

img = cv2.imread("img5.jpg")

img = model(img)

img = img[0].plot()

cv2.imshow("Img", img)

cv2.waitKey(0)
cv2.destroyAllWindows()

Additional

No response

Are you willing to submit a PR?

Y-T-G commented 1 month ago

Does yolov8n.pt work?

warhammercasey commented 1 month ago

Yes yolov8n.pt works fine, so does onnx and tflite format. This is only an issue with the ncnn format.

Y-T-G commented 1 month ago

Does NCNN work with yolov8n.pt?

warhammercasey commented 1 month ago

Do you mean something like:

model = YOLO('yolov8n.pt')
model.export(format='ncnn')
model = YOLO('yolov8n_ncnn_model')
# inference

?

No. Thats what I have been trying to do and thats whats not working

glenn-jocher commented 1 month ago

It seems like the NCNN export might be causing issues on your device. Please ensure your environment meets all NCNN requirements and consider testing with a smaller model like yolov8n to see if the problem persists.

Y-T-G commented 1 month ago

In your question, you're using the yolov8s model, so I was wondering whether yolov8n works with NCNN since it is smaller and should consume less memory.

warhammercasey commented 1 month ago

Oh my bad I should have clarified. Ive tried both the yolov8n and yolov8s models as well as the yolov8m model.

What are the NCNN requirements? I dont believe its running out of memory considering it runs the pytorch and onnx models without issue.

Y-T-G commented 1 month ago

It could be an issue with ncnn itself. You can try running this and see if it causes a reboot.

import ncnn as pyncnn
import numpy as np
from pathlib import Path

w = "yolov8n_ncnn_model"

net = pyncnn.Net()
net.opt.use_vulkan_compute = False
w = Path(w)
if not w.is_file():  # if not *.param
    w = next(w.glob("*.param"))
net.load_param(str(w))
net.load_model(str(w.with_suffix(".bin")))

im = np.random.rand(1, 3, 640, 640)

mat_in = pyncnn.Mat(im[0])
for i in range(30):
    with net.create_extractor() as ex:
        ex.input(net.input_names()[0], mat_in)
        y = [np.array(ex.extract(x)[1])[None] for x in sor
ted(net.output_names())]
warhammercasey commented 1 month ago

That errors on the y = [np.array(ex.extract(x)[1])[None] for x in sorted(net.output_names())] line.

On my x86 machine it runs into this error when doing np.array(ex.extract('out0')[1]):

terminate called after throwing an instance of 'std::runtime_error'
  what():  Convert ncnn.Mat to numpy.ndarray. Support only elemsize 1, 2, 4; but given 8
Aborted

On the pi clone it segfaults when running ex.extract('out0').

So both error but for different reasons.

glenn-jocher commented 1 month ago

It seems like the issue might be related to the data type conversion in NCNN. You could try checking the model's output layer configurations or consider using a different model format that works on your devices.

Y-T-G commented 1 month ago

Try using im = np.random.rand(1, 3, 640, 640).astype(np.float32)