openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7k stars 2.21k forks source link

[Bug] The inference results of Python API are not consistent with that of CPP API #14082

Closed Raise-me-up closed 1 year ago

Raise-me-up commented 1 year ago
System information (version)

Python API info: openvino 2022.2.0 CPP API info: openvino 2022.2.0-7713-af16ea1d79a Mo info: 2022.3.0-8539-c953186ff0a Operating System / Platform => Windows 64 Bit Compiler => Visual Studio 2019 Problem classification: Result Inconsistency Framework: Pytorch Model name: DPCRN

Detailed description

I find the inference results of openvino are inconsistent between python api and cpp api. For convenience of comparison, I flatten the input and output data into 1D. The codes are as follows:

Steps to reproduce
Python inference codes

from openvino.runtime import Core import numpy as np import torch

ie = Core()

dpcrn_model_xml = "models\DPCRN_cpxo_1d_ckpt58_dynamic.xml" model = ie.read_model(dpcrn_model_xml)

input_file = "np_data/high_snr_spect_arr_1d.npy" output_arr = "result/output_complex_1d_high_snr_py.npy"

input_shape = (1, 257, 5368, 2) input_array = torch.tensor(np.load(input_file)) input_array = input_array.reshape(input_shape)

model.reshape(input_shape) compiled_model = ie.compile_model(model=model, device_name="CPU")

input_ir = compiled_model.input("input_data") output_ir = compiled_model.output("out_complex")

request = compiled_model.create_infer_request() request.infer(inputs=[input_array]) out_complex = request.get_tensor(output_ir).data

np.save(output_arr, np.float32(out_complex))

Python inference codes
CPP inference codes

Tips: I used the libnpy(https://github.com/llohse/libnpy) module to load numpy data.

include "iostream"

// clang-format off

include "openvino/openvino.hpp"

include "utils/slog.hpp"

include "utils/common.hpp"

include "npy.hpp"

// clang-format on

using namespace ov; using namespace std;

int main(int argc, char* argv[]) { try { string FLAGS_d = "CPU";

    // 1D tensor
    string FLAGS_i = "data/high_snr_spect_arr_1d.npy";
    string FLAGS_m = "models/DPCRN_cpxo_1d_ckpt58_dynamic.xml";
    string FLAGS_o = "results/output_complex_1d_high_snr_cpp.npy";

// -------- Step 1. Initialize OpenVINO Runtime Core --------
ov::Core core;

// ------------------ Step 2. Read a model ------------------
std::shared_ptr<ov::Model> model = core.read_model(FLAGS_m);
logBasicModelInfo(model);

    ov::OutputVector inputs = model->inputs();
    ov::OutputVector outputs = model->outputs();

    ov::CompiledModel compiled_model = core.compile_model(model, FLAGS_d, {});
    logCompiledModelInfo(compiled_model, FLAGS_m, FLAGS_d);

    ov::InferRequest infer_request = compiled_model.create_infer_request();

    // Prepare input
    // get size of network input (patch_size)
    std::string input_name("input_data");

    // read input data
    vector<unsigned long> shape_tmp;
    vector<unsigned long> shape = { 1, 257, 5368, 2 };
    vector<float> inp_data_fp32;
    vector<float> out_cpx_fp32;

    bool is_fortran;
    shape_tmp.clear();
    inp_data_fp32.clear();
    out_cpx_fp32.clear();
    npy::LoadArrayFromNumpy(FLAGS_i, shape_tmp, is_fortran, inp_data_fp32);

    //reshape the model shape
    model->reshape({shape[0], shape[1], shape[2], shape[3]});
    slog::info << "----------After model reshape----------" << slog::endl;
    logBasicModelInfo(model);

    ov::Shape data_shape = {shape[0], shape[1], shape[2], shape[3]};
    size_t inp_size = inp_data_fp32.size();

    auto start_time = std::chrono::steady_clock::now();
    ov::Tensor input_tensor(ov::element::f32, data_shape, &inp_data_fp32[0]);
    float* tmp = input_tensor.data<float>();

    infer_request.set_tensor(input_name, input_tensor);

    infer_request.infer();

    // process output
    ov::Tensor out_cpx = infer_request.get_tensor("out_complex");

    int o_size = out_cpx.get_size();
    out_cpx_fp32.resize(o_size, 0);

    float* dst_o = &out_cpx_fp32[0];

    float* tmp_out = out_cpx.data<float>();

    std::memcpy(dst_o, out_cpx.data<float>(), o_size * sizeof(float));

    using ms = std::chrono::duration<double, std::ratio<1, 1000>>;
    double total_latency = std::chrono::duration_cast<ms>(std::chrono::steady_clock::now() - start_time).count();
    slog::info << "Metrics report:" << slog::endl;
    slog::info << "\tLatency: " << std::fixed << std::setprecision(1) << total_latency << " ms" << slog::endl;

    vector<long unsigned> v_shape{ (long unsigned)o_size };

    npy::SaveArrayAsNumpy(FLAGS_o, true, v_shape.size(), v_shape.data(), out_cpx_fp32);
}
catch (const std::exception& error) {
    slog::err << error.what() << slog::endl;
    return 1;
}
catch (...) {
    slog::err << "Unknown/internal exception happened" << slog::endl;
    return 1;
}
slog::info << "Execution successful" << slog::endl;
return 0;

}

CPP inference codes
results comparison

import numpy as np import torch

py_res = "result/output_complex_1d_high_snr_py.npy" cpp_res = "result/output_complex_1d_high_snr_cpp.npy"

py_arr = torch.tensor(np.load(py_res)) cpp_arr = torch.tensor(np.load(cpp_res))

flag = torch.gt(torch.abs(py_arr - cpp_arr), 1e-6).numpy() index = np.argwhere(flag == True) index = torch.tensor(index)

torch.set_printoptions(edgeitems=10) print(index) print(py_arr[index]) print(cpp_arr[index])

results comparison

And the attachments are as follows:

model.zip

input.zip

output.zip

Please check and help, thanks!

mlukasze commented 1 year ago

Hey @Raise-me-up thanks for noticing us about that issue. Let us check it, we will back to you soon.

Raise-me-up commented 1 year ago

@mlukasze OK! Thank you very much for your immediate response!

Raise-me-up commented 1 year ago

Hey @mlukasze

Is there any progress?

mlukasze commented 1 year ago

the bad news: not yet. the better one: we will take a look on it today.

sorry you have to wait - long queue.

Raise-me-up commented 1 year ago

@mlukasze

OK! At least you begin to concern it. (^ω^)

Raise-me-up commented 1 year ago

Hey @mlukasze Do you need more information to locate the problem?

jiwaszki commented 1 year ago

Hi @Raise-me-up , I am currently looking at your case and some topics need to be clarified.

Python API info: openvino 2022.2.0 CPP API info: openvino 2022.2.0-7713-af16ea1d79a Mo info: 2022.3.0-8539-c953186ff0a

Are you able to run your snippet under Python/C++ API 2022.3? Or is it simply a typo with MO version being pinned to 2022.3? It is best to use both MO and Runtime from the same release. BTW are you able to attach original model and MO command that was used to create xml file?

Second thing to consider is memory-layout of saved data.

How input data was generated? What is the format of it? If you flatten it as C-layout (row-major), than consecutive rows will be saved. If you flatten it as Fortran-layout (column-major), than columns will be saved in a sequence. This is important to understand how to transform data back to correct input for the inference.

I see that you used two ways of saving:

Looking forward to your answers!

Raise-me-up commented 1 year ago

Hi @jiwaszki

Thank you for your detailed reply!

I can run my program uder C++ API 2022.3 now, but the problem still occurs. The ONNX model is as follows:

model_onnx.zip

The MO command is simply: MO --input_model DPCRN.onnx --output_dir ./

As to the memory-layout of saved data, I doubt it as well at first. So I flatten the data into 1D.

The raw data is wav file. I use librosa to load it, and then do stft for it. Finally, I reshape it into 1D array and save it.

audio = librosa.load(input_file, sr=16000)[0]
torch_stft = torch.stft(torch.tensor(audio), n_fft=filter_length, hop_length=hop_length, win_length=win_length, window=WINDOW.to("cpu"), center=True)
torch_stft = torch.unsqueeze(torch_stft, dim=0)
torch_stft_np = torch_stft.numpy()
torch_stft_np = torch_stft_np.reshape(-1)
np.save(output_file, np.float32(torch_stft_np))

There is nothing special, and I am still confused.

Looking forward to your answers soon as well!

Raise-me-up commented 1 year ago

@jiwaszki BTW, the variable is_fortran is always set to false.

Raise-me-up commented 1 year ago

I have compared the 1D tensor in CPP with the 1D array in python, and the data layout is consistent. However, I am confused about the data layout process of openvino. Can it reshape the multidimension tensor from 1D tensor correctly?

Raise-me-up commented 1 year ago

BTW, Can you debug into the function infer_request.infer() to compare the differece of the results of CPP api and that of python API?

Raise-me-up commented 1 year ago

Hey @jiwaszki

I check it again, and find that the results of openvino python api are consistent with pytorch ones, while that of cpp api has something wrong. Maybe the reason exists in the input data, but I can't figure it out anyway. Could you provide a correct cpp test code, so that I can continue to debug please?

Raise-me-up commented 1 year ago

Hey @jiwaszki

I find the reason at last! The problem results from the difference of MO version and openvino CPP API version. I make them unified, and the bug disappears!

jiwaszki commented 1 year ago

@Raise-me-up that is some great news! As I mentioned "It is best to use both MO and Runtime from the same release" -- there is a lot of changes between versions and they might affect OpenVINO's runtime.

I am closing the issue as it is resolved (please re-open if additional assistance is needed).