batch inference - Githubissues

largestcabbage commented 1 year ago

Will there be a performance boost for batch inference? Have you ever done such an experiment?

ilya-lavrenov commented 1 year ago

Hi @largestcabbage,

Yes, it will be, but depends on the device, because optional batch size may vary. Please, read about the feature in OpenVINO 2.0 release where OpenVINO is able to perform automatic batching for several devices https://docs.openvino.ai/latest/openvino_docs_OV_UG_Automatic_Batching.html So, users don't have to collect batch on application side, OV can do this internally if ov::hint::performance_mode is set to throughput (see https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html additionally)

largestcabbage commented 1 year ago

@ilya-lavrenov Will there be a performance boost for batch inference? For cpu, different batch sizes, will fps be different? The bigger the batch, the higher the fps?

andrei-kochin commented 1 year ago

Hello @largestcabbage,

In general yes, you can expect better performance with higher batch. For example in our internal validation we have densenet-121 performance boost after increasing batch size from 1 to 8.

largestcabbage commented 1 year ago

@andrei-kochin Can I ask how batch inference in c++ is implemented? My tests showed batch inference did not improve fps. My code is as follows:

Core core;
const Layout model_layout{ "NCHW" };
tensor_shape = model_object->input().get_shape();
tensor_shape[layout::batch_idx(model_layout)] = input_batch_size;//1,2,4,8,16
tensor_shape[layout::channels_idx(model_layout)] = input_channel_size;
tensor_shape[layout::height_idx(model_layout)] = input_height_size;
tensor_shape[layout::width_idx(model_layout)] = input_width_size;
model_object->reshape({ {model_object->input().get_any_name(), tensor_shape} });
CompiledModel compiled_model_object = core.compile_model(model_object, device_name);
infer_request_object = compiled_model_object.create_infer_request();

//fill input tensor
Tensor input_tensor = infer_request_object.get_input_tensor();
float* input_tensor_data = input_tensor.data<float>();  
for (size_t b = 0; b < 1; b++) {
    for (size_t c = 0; c < num_channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                input_tensor_data[b * num_channels * width * height + c * width * height + h * width + w] = blob_image_list[b].at<Vec<float, 3>>(h, w)[c];
            }
        }
    }
}

infer_request_object.infer();

Is batch inference implemented like this? If not, can I ask you how to implement it in C++? Can you improve the detailed C++ code?

Looking forward to your reply.

akladiev commented 1 year ago

This issue will be closed in 2 weeks in case of no activity.

openvinotoolkit / openvino

batch inference #12923