sravansenthiln1 / rknn_tflite

RKNN TFLite implementations based on https://github.com/sravansenthiln1/armnn_tflite
4 stars 1 forks source link

Yolov8 tflite #1

Open StuartIanNaylor opened 1 year ago

StuartIanNaylor commented 1 year ago

https://github.com/ultralytics/ultralytics gives some super easy ways to convert the original pytorch to a number of models.

yolo export model=yolov8n.pt format=tflite int8=true imgsz=320 should give you a folder with the completed tflite. That model runs fine with yolo predict model=./yolov8n_saved_model/yolov8n_int8.tflite imgsz=320

If you use your convert.py and create the .rknn

"""
Usage example 
python yolov8.py --model ./yolov8m.rknn --img bus.jpg
"""
import cv2
import numpy as np
from rknnlite.api import RKNNLite
import time
import argparse

RKNN_MODEL = 'yolov8m_RK3588_i8.rknn'
IMGSZ = (320, 320)

CLASSES = ("person", "bicycle", "car", "motorbike ", "aeroplane ", "bus ", "train", "truck ", "boat", "traffic light",
           "fire hydrant", "stop sign ", "parking meter", "bench", "bird", "cat", "dog ", "horse ", "sheep", "cow", "elephant",
           "bear", "zebra ", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
           "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife ",
           "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza ", "donut", "cake", "chair", "sofa",
           "pottedplant", "bed", "diningtable", "toilet ", "tvmonitor", "laptop ", "mouse   ", "remote ", "keyboard ", "cell phone", "microwave ",
           "oven ", "toaster", "sink", "refrigerator ", "book", "clock", "vase", "scissors ", "teddy bear ", "hair drier", "toothbrush ")

def preprocess(img_path):
    img = cv2.imread(img_path)
    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, IMGSZ)

    return img

def postprocess(output, confidence_thres=0.5, iou_thres=0.5):
    outputs = np.transpose(np.squeeze(output[0]))

    # Get the number of rows in the outputs array
    rows = outputs.shape[0]

    # Lists to store the bounding boxes, scores, and class IDs of the detections
    boxes = []
    scores = []
    class_ids = []

    # Calculate the scaling factors for the bounding box coordinates
    x_factor = 1
    y_factor = 1

    # Iterate over each row in the outputs array
    for i in range(rows):
        # Extract the class scores from the current row
        classes_scores = outputs[i][4:]

        # Find the maximum score among the class scores
        max_score = np.amax(classes_scores)

        # If the maximum score is above the confidence threshold
        if max_score >= confidence_thres:
            # Get the class ID with the highest score
            class_id = np.argmax(classes_scores)

            # Extract the bounding box coordinates from the current row
            x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]

            # Calculate the scaled coordinates of the bounding box
            x1 = int((x - w / 2) * x_factor)
            y1 = int((y - h / 2) * y_factor)
            x2 = x1 + int(w * x_factor)
            y2 = y1 + int(h * y_factor)

            # Add the class ID, score, and box coordinates to the respective lists
            class_ids.append(class_id)
            scores.append(max_score)
            boxes.append([x1, y1, x2, y2])

    # Apply non-maximum suppression to filter out overlapping bounding boxes
    indices = cv2.dnn.NMSBoxes(boxes, scores, confidence_thres, iou_thres)

    detections = []

    # Iterate over the selected indices after non-maximum suppression
    for i in indices:
        detections.append([
            boxes[i],
            scores[i],
            class_ids[i]
        ])

    # Return the modified input image
    return detections

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--img', type=str, default='bus.jpg')
    parser.add_argument('--model', type=str, default='yolov8m_RK3588_i8.rknn')
    opt = parser.parse_args()
    args = vars(opt)
    rknn_lite = RKNNLite()
    print(args['model'])
    ret = rknn_lite.load_rknn(args['model'])
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)

    ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2)
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)

    start = time.time()
    img_data = preprocess(args['img'])
    outputs = rknn_lite.inference(inputs=[img_data])
    print(f"inference time: {(time.time() - start) * 1000} ms")

    detections = postprocess(outputs[0])
    print(f"detection time: {(time.time() - start) * 1000} ms")

    img_orig = cv2.imread(args['img'])
    img_orig = cv2.resize(img_orig, IMGSZ)

    for d in detections:
        score, class_id = d[1], d[2]
        x1, y1, x2, y2 = d[0][0], d[0][1], d[0][2], d[0][3]
        cv2.rectangle(img_orig, (x1, y1), (x2, y2), 2)
        label = f'{CLASSES[class_id]}: {score:.2f}'
        label_height = 10
        label_x = x1
        label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
        cv2.putText(img_orig, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
    cv2.imwrite('yolov8_result.jpg', img_orig)

    print(f"{(time.time() - start) * 1000} ms")
(venv) (base) stuart@stuart-desktop:~/rknn_tflite/yolov8$ python detect.py --model yolov8n_int8.rknn --img bus.jpg
yolov8n_int8.rknn
I RKNN: [07:09:28.876] RKNN Runtime Information: librknnrt version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22)
I RKNN: [07:09:28.876] RKNN Driver Information: version: 0.9.2
I RKNN: [07:09:28.877] RKNN Model Information: version: 6, toolkit version: 1.5.2+b642f30c(compiler version: 1.5.2 (c6b7b351a@2023-08-23T07:39:01)), target: RKNPU v2, target platform: rk3588, framework name: TFLite, framework layout: NHWC, model inference type: static_shape
W RKNN: [07:09:28.972] Output(PartitionedCall:0): size_with_stride larger than model origin size, if need run OutputOperator in NPU, please call rknn_create_memory using size_with_stride.
inference time: 69.1690444946289 ms
detection time: 109.00640487670898 ms
127.61306762695312 ms

Its actually slower than the original tflite and gives an error about strides and the output I will leave you to have a look.

Would be really great if you could export yolov8 tflite to rknn as a few have tried and failed. Hopefully you can make it work?

sravansenthiln1 commented 12 months ago

Hi @StuartIanNaylor I have created yolov8n examples on my side and will upload them soon, however i used a non-quantized fp32 model (rknn turns this into fp16) for my example for accuracy reasons, and achieves inference in ~95ms, please try comparing your model with this model and see how it fares.

Regarding the warning, it doesn't seem to affect the runtime or output, and not sure about it will look into it.

StuartIanNaylor commented 12 months ago

I guess we both haven't made the dtb change to enable the reserved memory area https://github.com/rockchip-linux/rknpu2/blob/master/doc/RK3588_NPU_SRAM_usage.md as it loads weights files and guess can dma that area. I should do as have done before btu maybe the speeds we are getting is all its capable of.

If you look at https://forum.radxa.com/t/use-yolov8-in-rk3588-npu/15838/132

I have just done a hack script just to launch x4 process

(venv) stuart@stuart-desktop:~/ultralytics$ ./test
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
Ultralytics YOLOv8.0.217 🚀 Python-3.10.12 torch-2.1.1 CPU (Cortex-A55)
Ultralytics YOLOv8.0.217 🚀 Python-3.10.12 torch-2.1.1 CPU (Cortex-A55)
Ultralytics YOLOv8.0.217 🚀 Python-3.10.12 torch-2.1.1 CPU (Cortex-A55)
Ultralytics YOLOv8.0.217 🚀 Python-3.10.12 torch-2.1.1 CPU (Cortex-A55)
Loading yolov8n_saved_model/yolov8n_full_integer_quant.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading yolov8n_saved_model/yolov8n_full_integer_quant.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading yolov8n_saved_model/yolov8n_full_integer_quant.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading yolov8n_saved_model/yolov8n_full_integer_quant.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
image 1/28 /tmp/images/0x0.webp: 320x320 1 car, 38.5ms

image 1/28 /tmp/images/0x0.webp: 320x320 1 car, 37.9ms
image 1/28 /tmp/images/0x0.webp: 320x320 1 car, 42.3ms
image 2/28 /tmp/images/1_JS286428040.webp: 320x320 3 persons, 30.9ms
image 2/28 /tmp/images/1_JS286428040.webp: 320x320 3 persons, 30.2ms
image 2/28 /tmp/images/1_JS286428040.webp: 320x320 3 persons, 95.0ms
image 1/28 /tmp/images/0x0.webp: 320x320 1 car, 44.6ms
image 3/28 /tmp/images/1d6830cc-9382-11ed-ad8c-0210609a3fe2.jpg: 320x320 5 persons, 36.6ms
image 2/28 /tmp/images/1_JS286428040.webp: 320x320 3 persons, 32.7ms
image 3/28 /tmp/images/1d6830cc-9382-11ed-ad8c-0210609a3fe2.jpg: 320x320 5 persons, 45.8ms
image 3/28 /tmp/images/1d6830cc-9382-11ed-ad8c-0210609a3fe2.jpg: 320x320 5 persons, 33.4ms
image 4/28 /tmp/images/220805-border-collie-play-mn-1100-82d2f1.webp: 320x320 1 dog, 31.2ms
image 5/28 /tmp/images/90-vauxhall-corsa-electric-best-small-cars.jpg: 320x320 1 car, 29.4ms
image 4/28 /tmp/images/220805-border-collie-play-mn-1100-82d2f1.webp: 320x320 1 dog, 52.2ms
image 3/28 /tmp/images/1d6830cc-9382-11ed-ad8c-0210609a3fe2.jpg: 320x320 5 persons, 42.7ms
image 4/28 /tmp/images/220805-border-collie-play-mn-1100-82d2f1.webp: 320x320 1 dog, 30.6ms
image 6/28 /tmp/images/Can-a-single-person-own-a-firm-in-India.jpg: 320x320 1 person, 30.4ms
image 5/28 /tmp/images/90-vauxhall-corsa-electric-best-small-cars.jpg: 320x320 1 car, 32.9ms
image 4/28 /tmp/images/220805-border-collie-play-mn-1100-82d2f1.webp: 320x320 1 dog, 33.0ms
image 5/28 /tmp/images/90-vauxhall-corsa-electric-best-small-cars.jpg: 320x320 1 car, 30.3ms
image 7/28 /tmp/images/DCTM_Penguin_UK_DK_AL697473_RGB_PNG_namnse.webp: 320x320 1 cat, 29.7ms
image 6/28 /tmp/images/Can-a-single-person-own-a-firm-in-India.jpg: 320x320 1 person, 31.5ms
image 5/28 /tmp/images/90-vauxhall-corsa-electric-best-small-cars.jpg: 320x320 1 car, 34.9ms
image 6/28 /tmp/images/Can-a-single-person-own-a-firm-in-India.jpg: 320x320 1 person, 44.4ms
image 8/28 /tmp/images/Planning-a-teen-party-narrow.jpg: 320x320 9 persons, 30.1ms
image 6/28 /tmp/images/Can-a-single-person-own-a-firm-in-India.jpg: 320x320 1 person, 31.8ms
image 7/28 /tmp/images/DCTM_Penguin_UK_DK_AL697473_RGB_PNG_namnse.webp: 320x320 1 cat, 42.6ms
image 7/28 /tmp/images/DCTM_Penguin_UK_DK_AL697473_RGB_PNG_namnse.webp: 320x320 1 cat, 29.7ms
image 9/28 /tmp/images/RollingStoneAwardsAaronParsonsPhotography23.11_highres_30-1024x683.jpg: 320x320 4 persons, 30.6ms
image 7/28 /tmp/images/DCTM_Penguin_UK_DK_AL697473_RGB_PNG_namnse.webp: 320x320 1 cat, 30.3ms
image 8/28 /tmp/images/Planning-a-teen-party-narrow.jpg: 320x320 9 persons, 29.7ms
image 8/28 /tmp/images/Planning-a-teen-party-narrow.jpg: 320x320 9 persons, 38.5ms
image 8/28 /tmp/images/Planning-a-teen-party-narrow.jpg: 320x320 9 persons, 29.5ms
image 10/28 /tmp/images/VIER PFOTEN_2016-07-08_011-5184x2712-1200x628.jpg: 320x320 1 cat, 33.2ms
image 9/28 /tmp/images/RollingStoneAwardsAaronParsonsPhotography23.11_highres_30-1024x683.jpg: 320x320 4 persons, 31.0ms
image 9/28 /tmp/images/RollingStoneAwardsAaronParsonsPhotography23.11_highres_30-1024x683.jpg: 320x320 4 persons, 33.0ms
image 9/28 /tmp/images/RollingStoneAwardsAaronParsonsPhotography23.11_highres_30-1024x683.jpg: 320x320 4 persons, 29.8ms
image 11/28 /tmp/images/XLB_4.png: 320x320 1 dog, 30.1ms
image 10/28 /tmp/images/VIER PFOTEN_2016-07-08_011-5184x2712-1200x628.jpg: 320x320 1 cat, 30.8ms
image 10/28 /tmp/images/VIER PFOTEN_2016-07-08_011-5184x2712-1200x628.jpg: 320x320 1 cat, 31.0ms
image 10/28 /tmp/images/VIER PFOTEN_2016-07-08_011-5184x2712-1200x628.jpg: 320x320 1 cat, 29.5ms
image 12/28 /tmp/images/_119254021_lotusemira.jpg: 320x320 1 car, 30.7ms
image 11/28 /tmp/images/XLB_4.png: 320x320 1 dog, 30.1ms
image 11/28 /tmp/images/XLB_4.png: 320x320 1 dog, 31.1ms
image 13/28 /tmp/images/athletic-women-walking-together-on-remote-trail-royalty-free-image-1626378592.jpg: 320x320 5 persons, 30.1ms
image 11/28 /tmp/images/XLB_4.png: 320x320 1 dog, 32.8ms
image 12/28 /tmp/images/_119254021_lotusemira.jpg: 320x320 1 car, 29.8ms
image 12/28 /tmp/images/_119254021_lotusemira.jpg: 320x320 1 car, 30.5ms
image 14/28 /tmp/images/birthdayparty.jpg: 320x320 5 persons, 30.3ms
image 12/28 /tmp/images/_119254021_lotusemira.jpg: 320x320 1 car, 29.7ms
image 13/28 /tmp/images/athletic-women-walking-together-on-remote-trail-royalty-free-image-1626378592.jpg: 320x320 5 persons, 31.1ms
image 13/28 /tmp/images/athletic-women-walking-together-on-remote-trail-royalty-free-image-1626378592.jpg: 320x320 5 persons, 31.4ms
image 15/28 /tmp/images/bloat_md.jpg: 320x320 3 persons, 3 bicycles, 1 bench, 30.1ms
image 13/28 /tmp/images/athletic-women-walking-together-on-remote-trail-royalty-free-image-1626378592.jpg: 320x320 5 persons, 31.9ms
image 14/28 /tmp/images/birthdayparty.jpg: 320x320 5 persons, 40.1ms
image 14/28 /tmp/images/birthdayparty.jpg: 320x320 5 persons, 32.6ms
image 16/28 /tmp/images/bus.jpg: 320x320 5 persons, 1 bus, 29.5ms
image 14/28 /tmp/images/birthdayparty.jpg: 320x320 5 persons, 39.0ms
image 15/28 /tmp/images/bloat_md.jpg: 320x320 3 persons, 3 bicycles, 1 bench, 30.4ms
image 15/28 /tmp/images/bloat_md.jpg: 320x320 3 persons, 3 bicycles, 1 bench, 31.8ms
image 15/28 /tmp/images/bloat_md.jpg: 320x320 3 persons, 3 bicycles, 1 bench, 30.5ms
image 17/28 /tmp/images/gettyimages-1094874726.png: 320x320 1 dog, 1 sheep, 30.3ms
image 16/28 /tmp/images/bus.jpg: 320x320 5 persons, 1 bus, 29.7ms
image 16/28 /tmp/images/bus.jpg: 320x320 5 persons, 1 bus, 32.5ms
image 16/28 /tmp/images/bus.jpg: 320x320 5 persons, 1 bus, 29.5ms
image 18/28 /tmp/images/image (1).jpg: 320x320 5 persons, 38.7ms
image 17/28 /tmp/images/gettyimages-1094874726.png: 320x320 1 dog, 1 sheep, 30.6ms
image 17/28 /tmp/images/gettyimages-1094874726.png: 320x320 1 dog, 1 sheep, 30.7ms
image 19/28 /tmp/images/image.jpg: 320x320 1 car, 5 trucks, 30.0ms
image 17/28 /tmp/images/gettyimages-1094874726.png: 320x320 1 dog, 1 sheep, 29.7ms
image 18/28 /tmp/images/image (1).jpg: 320x320 5 persons, 35.0ms
image 18/28 /tmp/images/image (1).jpg: 320x320 5 persons, 31.9ms
image 20/28 /tmp/images/images.jpg: 320x320 2 persons, 30.2ms
image 18/28 /tmp/images/image (1).jpg: 320x320 5 persons, 30.2ms
image 19/28 /tmp/images/image.jpg: 320x320 1 car, 5 trucks, 29.7ms
image 21/28 /tmp/images/kiss-haunted-house-party-watch-back-performances.jpg: 320x320 4 persons, 29.6ms
image 19/28 /tmp/images/image.jpg: 320x320 1 car, 5 trucks, 32.4ms
image 19/28 /tmp/images/image.jpg: 320x320 1 car, 5 trucks, 31.7ms
image 20/28 /tmp/images/images.jpg: 320x320 2 persons, 32.7ms
image 22/28 /tmp/images/man-walking-1024x651.jpg: 320x320 2 persons, 29.6ms
image 20/28 /tmp/images/images.jpg: 320x320 2 persons, 30.5ms
image 20/28 /tmp/images/images.jpg: 320x320 2 persons, 30.8ms
image 21/28 /tmp/images/kiss-haunted-house-party-watch-back-performances.jpg: 320x320 4 persons, 31.3ms
image 23/28 /tmp/images/party-games.png: 320x320 6 persons, 29.7ms
image 21/28 /tmp/images/kiss-haunted-house-party-watch-back-performances.jpg: 320x320 4 persons, 29.0ms
image 21/28 /tmp/images/kiss-haunted-house-party-watch-back-performances.jpg: 320x320 4 persons, 31.4ms
image 22/28 /tmp/images/man-walking-1024x651.jpg: 320x320 2 persons, 30.1ms
image 24/28 /tmp/images/truck.webp: 320x320 1 truck, 29.6ms
image 22/28 /tmp/images/man-walking-1024x651.jpg: 320x320 2 persons, 30.9ms
image 22/28 /tmp/images/man-walking-1024x651.jpg: 320x320 2 persons, 30.0ms
image 23/28 /tmp/images/party-games.png: 320x320 6 persons, 30.2ms
image 25/28 /tmp/images/walking.jpg: 320x320 1 person, 29.6ms
image 23/28 /tmp/images/party-games.png: 320x320 6 persons, 29.9ms
image 23/28 /tmp/images/party-games.png: 320x320 6 persons, 30.4ms
image 24/28 /tmp/images/truck.webp: 320x320 1 truck, 30.0ms
image 24/28 /tmp/images/truck.webp: 320x320 1 truck, 36.3ms
image 24/28 /tmp/images/truck.webp: 320x320 1 truck, 39.0ms
image 26/28 /tmp/images/why-do-cats-have-whiskers-1.jpg: 320x320 (no detections), 31.1ms
image 25/28 /tmp/images/walking.jpg: 320x320 1 person, 32.9ms
image 25/28 /tmp/images/walking.jpg: 320x320 1 person, 29.7ms
image 25/28 /tmp/images/walking.jpg: 320x320 1 person, 29.8ms
image 27/28 /tmp/images/why-is-it-called-a-semi-truck.jpg: 320x320 1 truck, 31.9ms
image 26/28 /tmp/images/why-do-cats-have-whiskers-1.jpg: 320x320 (no detections), 31.1ms
image 26/28 /tmp/images/why-do-cats-have-whiskers-1.jpg: 320x320 (no detections), 31.5ms
image 26/28 /tmp/images/why-do-cats-have-whiskers-1.jpg: 320x320 (no detections), 29.8ms
image 28/28 /tmp/images/wild-dog.jpg: 320x320 1 sheep, 30.2ms
Speed: 7.2ms preprocess, 31.1ms inference, 6.3ms postprocess per image at shape (1, 3, 320, 320)
💡 Learn more at https://docs.ultralytics.com/modes/predict
image 27/28 /tmp/images/why-is-it-called-a-semi-truck.jpg: 320x320 1 truck, 30.5ms
image 27/28 /tmp/images/why-is-it-called-a-semi-truck.jpg: 320x320 1 truck, 29.6ms
image 27/28 /tmp/images/why-is-it-called-a-semi-truck.jpg: 320x320 1 truck, 29.8ms
image 28/28 /tmp/images/wild-dog.jpg: 320x320 1 sheep, 30.0ms
Speed: 6.7ms preprocess, 33.4ms inference, 6.7ms postprocess per image at shape (1, 3, 320, 320)
💡 Learn more at https://docs.ultralytics.com/modes/predict
image 28/28 /tmp/images/wild-dog.jpg: 320x320 1 sheep, 46.2ms
Speed: 6.7ms preprocess, 32.8ms inference, 6.4ms postprocess per image at shape (1, 3, 320, 320)
💡 Learn more at https://docs.ultralytics.com/modes/predict
image 28/28 /tmp/images/wild-dog.jpg: 320x320 1 sheep, 30.1ms
Speed: 6.9ms preprocess, 34.6ms inference, 7.4ms postprocess per image at shape (1, 3, 320, 320)
💡 Learn more at https://docs.ultralytics.com/modes/predict

So each core is managing approx Speed: 6.9ms preprocess, 34.6ms inference, 7.4ms postprocess so in total approx 133 Fps. I guess that is what is making me think there is something wrong with the npu or conversion framework but really it should be Int 8. But I did make a mistake as on 1st look ./yolov8n_saved_model/yolov8n_full_integer_quant.tflite says full integer but actually that is not in the last step of the process but runs x2 as fast ?!?

Maybe it is ~95ms and the NPU relatively sucks in comparison to the x4 big cores running tflite...

sravansenthiln1 commented 12 months ago

@StuartIanNaylor the reason for the not as expected performance observed on the NPU could be tied to memory limitations from copying tensor data to and from memory.