kssion commented 2 months ago

OpenVINO Version

2024.1.0

Operating System

macOS Systems for Apple Silicon

Device used for inference

CPU

Framework

None

Model used

https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar

Issue description

The prediction results using PaddleOCR text detection model (ch-PP-OCRv4_det) began to show errors in version 2024.1.0, with almost no text content detected. Previous versions were normal, with the same code and input image, only different version numbers.

Predicted results in version 2024.0.0

WX20240805-183322@2x

Predicted results in version 2024.1.0

WX20240805-183259@2x

It can be seen that in version 2024.1.0, most of the results are 0.

Step-by-step reproduction

My code

import cv2
import numpy as np
import openvino as ov

def DetResizeForTest(img):
    src_h, src_w, _ = img.shape

    limit_side_len = 960
    h, w, c = img.shape

    # limit the max side
    if max(h, w) > limit_side_len:
        if h > w:
            ratio = float(limit_side_len) / h
        else:
            ratio = float(limit_side_len) / w
    else:
        ratio = 1.

    resize_h = int(h * ratio)
    resize_w = int(w * ratio)

    resize_h = max(int(round(resize_h / 32) * 32), 32)
    resize_w = max(int(round(resize_w / 32) * 32), 32)

    try:
        if int(resize_w) <= 0 or int(resize_h) <= 0:
            return None, (None, None)
        img = cv2.resize(img, (int(resize_w), int(resize_h)))
    except:
        print(img.shape, resize_w, resize_h)
    ratio_h = resize_h / float(h)
    ratio_w = resize_w / float(w)

    return img

def NormalizeImage(img):
    """ normalize image such as substract mean, divide std
    """

    scale = 1.0 / 255.0
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]

    shape = (1, 1, 3)
    mean = np.array(mean).reshape(shape).astype('float32')
    std = np.array(std).reshape(shape).astype('float32')

    assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
    img = (img.astype('float32') * scale - mean) / std
    return img

class TextDetection(object):

    def __init__(self, model_path, device='AUTO'):
        core = ov.Core()
        model = core.read_model(model=model_path)
        self.model = core.compile_model(model=model, device_name=device)
        self.thresh = 0.3

    def __call__(self, im0):
        img = self.pre_processing_detection(im0)
        img = np.expand_dims(img, axis=0)

        det_results = self.model([img])[0]

        return det_results

    def pre_processing_detection(self, im0):
        """
        Preprocess input image for text detection

        Parameters:
            im0: input image
        """
        im = DetResizeForTest(im0)
        im = NormalizeImage(im)
        im = im.transpose((2, 0, 1))
        return im

img_path = '1722848502210592000_0.jpg'

img = cv2.imread(img_path)

# https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar
det_model = TextDetection('ch_PP-OCRv4_det_infer/inference.pdmodel')

dt_boxes = det_model(img)
print(dt_boxes)

Input image

general_ocr_003

Relevant log output

# 2024.0.0
[[[[1.8215377e-13 2.1279651e-13 1.1031871e-12 ... 7.2270118e-14
    1.9756947e-14 1.5429758e-13]
   [2.9629731e-15 3.5439558e-15 9.1118187e-14 ... 8.8349780e-15
    2.0234412e-15 1.7647633e-14]
   [7.7499916e-14 3.2404651e-14 5.1260705e-13 ... 1.9325702e-13
    2.8923587e-12 1.0666041e-11]
   ...
   [1.6165526e-14 1.9703500e-15 4.7535997e-16 ... 1.7223305e-15
    1.2211417e-17 1.2192896e-16]
   [1.0836028e-12 5.9138024e-14 6.2131757e-14 ... 5.5607260e-15
    3.5328755e-14 5.5411957e-14]
   [1.3802006e-12 7.9045196e-14 1.4889082e-14 ... 1.5708528e-14
    1.9864335e-14 6.7840116e-14]]]]

# 2024.1.0
[[[[0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   ...
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]]]]

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

wenjiew commented 1 month ago

@meiyang-intel Please help take a look. Thanks!

meiyang-intel commented 1 month ago

It may be the issue in auto plugin. @kssion , can you have a try to use the CPU device as below to see whether it can output correct result? det_model = TextDetection('ch_PP-OCRv4_det_infer/inference.pdmodel', 'CPU')

kssion commented 1 month ago

The result of using CPU is the same.

meiyang-intel commented 1 month ago

@kssion , I found openvino master branch didn't have this issue. Can you have a try to install https://pypi.org/project/openvino-nightly/ and check whether it still exists.

kssion commented 1 month ago

@meiyang-intel , not ok, Is it related to my hardware?

Device info:

Apple M1
macOS Sonoma 14.2.1 (23C71)

My cmd:

# Python 3.9.18
python3 -m venv openvino_env
source openvino_env/bin/activate
python -m pip install --upgrade pip
pip install openvino-nightly opencv-python
# pip install openvino==2024.0.0

cd ocr-test/
python ocr_test.py

Out-1:

# openvino (2024.0.0-14509-34caeefd078-releases/2024/0)
[[[[2.90401573e-13 1.02048896e-13 1.59580923e-14 ... 1.92933167e-14
    2.84697633e-14 6.55838231e-13]
   [5.95289970e-15 2.10930982e-15 6.58704277e-16 ... 7.15044144e-16
    8.53998893e-16 2.04200196e-14]
   [9.44296860e-14 7.98974924e-15 2.85937859e-14 ... 9.61401199e-15
    4.56825658e-13 4.76283943e-12]
   ...
   [2.87040560e-14 7.17875924e-15 1.30948033e-15 ... 9.74731952e-16
    2.07271910e-14 1.13725273e-13]
   [1.18844214e-12 1.30420997e-13 2.31416611e-13 ... 9.45223107e-14
    2.93433962e-12 1.02352900e-11]
   [2.67273486e-12 5.86507514e-13 2.34141320e-13 ... 4.37234003e-14
    4.81435855e-13 4.03402832e-12]]]]

Out-2:

# openvino-nightly (2024.5.0-16678-090da7b5376)
[[[[0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   ...
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]
   [0. 0. 0. ... 0. 0. 0.]]]]

openvinotoolkit / openvino

[Bug]:The prediction results using PaddleOCR model are inconsistent between versions before 2024.0.0 and after 2024.1.0 #25906