openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.27k stars 2.26k forks source link

[Bug]: Shape inference of Reshape node with name Reshape_315 failed | Corrupted tensor data #25764

Closed PrefectSol closed 3 months ago

PrefectSol commented 3 months ago

OpenVINO Version

2023.3.0

Operating System

Windows System

Device used for inference

CPU

Framework

ONNX

Model used

YOLOv8

Issue description

The problem is that when submitting a data package to the model with batch-size=1, the data in the tensor is distorted: it is expected that yolov8-obb will return a tensor with the following data format (checked on libtorch when exporting the same model to .torchscript):

12.854 9.57648 30.954 36.429 7.94892e-11 5.40184e-08 9.43046e-07 5.11332e-05 0.888742
26.5488 17.2847 41.4365 56.8054 1.4466e-09 4.58704e-08 1.98198e-05 0.00091497 0.867876
31.3084 24.8243 45.8364 85.9709 1.30787e-12 2.63177e-11 1.18003e-06 1.83736e-06 0.700613
...

And thus all values of the tensor are completely filled. However, when using OpenVino 2023.3.0 (batch-size=1) the tensor is distorted (the data of the model output tensor was obtained with strides taken into account and is fully analogous to TensorAccessor) as follows (tensor shape=[batch, 9, 4116]): First, in the amount of 4116\*9/4, the following values (presumably obb features) are given:

1: 7.9261 2: 37.1117 3: 68.4677 4: 100.144 5: 132.13 6: 164.865 7: 196.01 8: 227.623 9: 261.698
…

4114: 93.5156 4115: 55.7887 4116: 109.412 4117: 8.15633e-05 4118: 1.57335e-05 4119: 1.94462e-05 4120: 1.90671e-05 4121: 1.4062e-05 4122: 1.2665e-05

9235: 0.18122 9236: -0.0926018 9237: 0.382143 9238: 0.457001 9239: -0.0243431 9240: 0.391658 9241: 0.176661 9242: 0.247208 9243: -0.00380951 9244: 0.336006 9245: 0.456312 9246: 0.00866055 9247: 0.400055 9248: 0.145791 9249: 0.205104 9250: 0.00778693 9251: 0.302329 9252: 0.414665 9253: 0.0393117 9254: 0.367651 9255: 0.05902 9256: 0.496945 9257: -0.0857184 9258: 0.259553 9259: 0.120602 9260: 0.0778422 9261: 0.115007 9262: -nan 9263: 6.25275e-12 9264: -nan 9265: -nan 9266: -nan 9267: -nan 9268: -nan 9269: -nan 9270: -nan 9271: -nan 9272: -nan 9273: -nan 9274: -nan 9275: -nan 9276: -nan 9277: -4.2203e+37 9278: -1.69474e+38 9279: -nan 9280: -nan 9281: -nan 9282: -nan 9283: -nan 9284: -nan 9285: -nan 9286: -nan 9287: -nan 9288: -nan 9289: -nan 9290: -nan 9291: -nan 9292: -nan 9293: -nan 9294: -nan 9295: -nan 9296: -nan 9297: -nan 9298: -nan 9299: -nan 9300: -nan 9301: -nan 9302: -nan 9303: -nan 9304: -nan 9305: -nan 9306: -nan 9307: -nan 9308: -nan 9309: -nan 9310: -nan 9311: -nan 9312: -nan 9313: -nan 9314: -nan 9315: -nan 9316: -nan 9317: -nan 9318: -nan 9319: -nan 9320: -nan 9321: -nan 9322: -nan 9323: -nan 9324: -nan 9325: -nan 9326: -nan 9327: -nan 9328: -nan 9329: -nan 9330: -nan 9331: -nan 9332: -nan 9333: -nan 9334: -nan 9335: -nan 9336: -nan 9337: -nan 9338: -nan 9339: -nan 9340: -nan 9341: -nan 9342: -nan 9343: -nan 9344: -nan 9345: -nan 9346: -nan 9347: -nan 9348: -nan 9349: -nan 9350: -nan 9351: -nan 9352: -nan 9353: -nan 9354: -nan 9355: -nan 9356: -nan 9357: -nan 9358: -nan 9359: -nan 9360: -nan 9361: -nan 9362: -nan 9363: -nan 9364: -nan 9365: -nan 9366: -nan 9367: -nan 9368: -nan 9369: -nan 9370: -nan 9371: -nan 9372: -nan 9373: -nan 9374: -nan 9375: -nan 9376: -nan 9377: -nan 9378: -nan 9379: -nan 9380: -nan 9381: -nan 9382: -nan 9383: -nan 9384: -nan 9385: -nan 9386: -nan 9387: -nan 9388: -nan 9389: -nan 9390: -nan 9391: -nan 9392: -nan 9393: -nan 9394: -nan 9395: -nan 9396: -nan 9397: -nan 9398: -nan 9399: -nan 9400: -nan 9401: -nan 9402: -nan 9403: -nan 9404: -nan 9405: -nan 9406: -nan 9407: -nan 9408: -nan 9409: -nan 9410: -nan 9411: -nan 9412: -nan 9413: -nan 9414: -nan 9415: -nan 9416: -nan 9417: -1.70141e+38 9418: -1.69477e+38 9419: -1.69474e+38 9420: -nan 9421: -nan 9422: -nan 9423: -nan 9424: -nan 9425: -nan 9426: -nan 9427: -nan 9428: -nan 9429: -nan 9430: -nan 9431: -nan 9432: -nan 9433: -nan 9434: -nan 9435: -nan 9436: -nan 9437: -nan 9438: -nan 9439: -nan 9440: -nan 9441: -nan


If you try to increase the batch size to `batch-size=2`, there will be no output at all, and during `inferRequest.infer();` an error will be raised:

Exception from src\inference\src\infer_request.cpp:231:
Exception from src\plugins\intel_cpu\src\node.cpp:1660:
Shape inference of Reshape node with name Reshape_315 failed: Exception from src\plugins\intel_cpu\src\shape_inference\custom\reshape.cpp:61:
[cpu]reshape: the shape of input data (1.64.8232) conflicts with the reshape pattern (1.4.16.4116)

It is visible that somehow the second dimension `(64)` did not divide into` (4,16)`, and `8232` should have divided into 2 (packages).

The creation, initialization, and preparation of the model fully match the examples from the OpenVino documentation, moreover, this same initialization worked previously on version `2022.3` (with the corresponding model version), but now I do not see any changes in terms of model creation, but the problem indeed arises from nowhere.

Damaged model files are also excluded, as `predict.py` from yolov8 perfectly made predictions on this same model.

I tried downloading both the ready-made .zip build and compiling it myself from the OpenVino sources - it does not help.

`C++ version - 17`

It is not an option for me to use version 2024 or 2022 Openvino - I am specifically interested in 2023.

Code example:


void initPrePostProcessor(const std::shared_ptr<ov::Model> &model,
                                         ov::preprocess::PrePostProcessor *prePostProcessor,
                                         float meanR, float meanG, float meanB, float scale)
{
    const ov::Layout tensorLayout { "NHWC" };

    ov::Shape tensorShape = { 1, 448, 448, 3 };

        prePostProcessor->input().tensor()
            .set_element_type(ov::element::f32)
            .set_layout(tensorLayout)
            .set_spatial_static_shape(tensorShape[ov::layout::height_idx(tensorLayout)],
                                      tensorShape[ov::layout::width_idx(tensorLayout)]);

    prePostProcessor->input().preprocess()
        .mean({ meanR, meanG, meanB })
        .scale(scale);

    prePostProcessor->input().model()
        .set_layout("NCHW");

    const uint32_t outputSize = model.get()->get_output_size();
    for (uint32_t i = 0; i < outputSize; ++i)
    {
        prePostProcessor->output(i).tensor()
            .set_element_type(ov::element::f32);
    }
}
    ov::CompiledModel m_compiledModel;
    std::shared_ptr<ov::Model> model = core.read_model(std::string(modelXml, modelXml + modelXmlSize),
                                                       ov::Tensor(ov::element::u8, { modelBinSize }, modelBin));

    ov::preprocess::PrePostProcessor prePostProcessor(model);
    initPrePostProcessor(model, &prePostProcessor, meanR, meanG, meanB, scale);

    model = prePostProcessor.build();

    core.set_property(m_deviceName, ov::inference_num_threads(threadsNumber));
    core.set_property(m_deviceName, std::pair<std::string, std::string>("CPU_RUNTIME_CACHE_CAPACITY", std::to_string(0)));
m_compiledModel = core.compile_model(model, m_deviceName);

Infer:

const ov::Shape inputShape = { batchSize, m_inputSize, m_inputSize, m_channels };
const ov::Tensor inputTensor = ov::Tensor(ov::element::f32, inputShape, preparedImages);

ov::InferRequest inferRequest;
inferRequest = m_compiledModel.create_infer_request();
inferRequest.set_input_tensor(inputTensor);
inferRequest.infer();

Step-by-step reproduction

Relevant log output

Exception from src\inference\src\infer_request.cpp:231:
Exception from src\plugins\intel_cpu\src\node.cpp:1660:
Shape inference of Reshape node with name Reshape_315 failed: Exception from src\plugins\intel_cpu\src\shape_inference\custom\reshape.cpp:61:
[cpu]reshape: the shape of input data (1.64.8232) conflicts with the reshape pattern (1.4.16.4116)

Issue submission checklist

wenjiew commented 3 months ago

@yuxu42 to assign someone to analyze. Thanks!

yuxu42 commented 3 months ago

Hi @PrefectSol, as OpenVINO 2024.3 has been published recently, could you please try it to see whether it can solve the problem?

PrefectSol commented 3 months ago

Other versions of openvino work fine (2022.3 and 2024.3). I found a local solution to the problem by exporting the model to openvino with:

yolo export model=yolov8n.pt format=openvino imgsz=448

This avoids the reshape error and runs different packages by installing dynamic shape, but I still don't understand why the initial export didn't work. Moreover, the current solution distorts the format of the output tensor: Excepted (torch example):

    batch 1:
        x y w h class_conf_0 class_conf_1 class_conf_2 class_conf_3 theta 
        ... 
        (_4116)

    batch 2:
        x y w h class_conf_0 class_conf_1 class_conf_2 class_conf_3 theta 
        ... 
        (_4116)
Openvino output:
```
batch_1:
    x(_1), x(_2), ..., x(_4116)
    y(_1), y(_2), ..., y(_4116)
    w(_1), w(_2), ..., w(_4116)
    class_conf_0(_1), class_conf_0(_2), ..., class_conf_0(_4116)
    class_conf_1(_1), class_conf_1(_2), ..., class_conf_1(_4116)
    class_conf_2(_1), class_conf_2(_2), ..., class_conf_2(_4116)
    class_conf_3(_1), class_conf_3(_2), ..., class_conf_3(_4116)
    theta(_1), theta(_2), ..., theta(_4116)

batch_2:
    x(_1), x(_2), ..., x(_4116)
    y(_1), y(_2), ..., y(_4116)
    w(_1), w(_2), ..., w(_4116)
    class_conf_0(_1), class_conf_0(_2), ..., class_conf_0(_4116)
    class_conf_1(_1), class_conf_1(_2), ..., class_conf_1(_4116)
    class_conf_2(_1), class_conf_2(_2), ..., class_conf_2(_4116)
    class_conf_3(_1), class_conf_3(_2), ..., class_conf_3(_4116)
    theta(_1), theta(_2), ..., theta(_4116)
```
yuxu42 commented 3 months ago

Hi @PrefectSol Would you mind if using the latest OV (2024.3)? Or you prefer sticking to OpenVINO2023.3?