openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.34k stars 2.09k forks source link

batch inference #12923

Closed largestcabbage closed 9 months ago

largestcabbage commented 1 year ago

Will there be a performance boost for batch inference? Have you ever done such an experiment?

ilya-lavrenov commented 1 year ago

Hi @largestcabbage,

Yes, it will be, but depends on the device, because optional batch size may vary. Please, read about the feature in OpenVINO 2.0 release where OpenVINO is able to perform automatic batching for several devices https://docs.openvino.ai/latest/openvino_docs_OV_UG_Automatic_Batching.html So, users don't have to collect batch on application side, OV can do this internally if ov::hint::performance_mode is set to throughput (see https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html additionally)

largestcabbage commented 1 year ago

@ilya-lavrenov Will there be a performance boost for batch inference? For cpu, different batch sizes, will fps be different? The bigger the batch, the higher the fps?

andrei-kochin commented 1 year ago

Hello @largestcabbage,

In general yes, you can expect better performance with higher batch. For example in our internal validation we have densenet-121 performance boost after increasing batch size from 1 to 8.

largestcabbage commented 1 year ago

@andrei-kochin Can I ask how batch inference in c++ is implemented? My tests showed batch inference did not improve fps. My code is as follows:

Core core;
const Layout model_layout{ "NCHW" };
tensor_shape = model_object->input().get_shape();
tensor_shape[layout::batch_idx(model_layout)] = input_batch_size;//1,2,4,8,16
tensor_shape[layout::channels_idx(model_layout)] = input_channel_size;
tensor_shape[layout::height_idx(model_layout)] = input_height_size;
tensor_shape[layout::width_idx(model_layout)] = input_width_size;
model_object->reshape({ {model_object->input().get_any_name(), tensor_shape} });
CompiledModel compiled_model_object = core.compile_model(model_object, device_name);
infer_request_object = compiled_model_object.create_infer_request();

//fill input tensor
Tensor input_tensor = infer_request_object.get_input_tensor();
float* input_tensor_data = input_tensor.data<float>();  
for (size_t b = 0; b < 1; b++) {
    for (size_t c = 0; c < num_channels; c++) {
        for (size_t h = 0; h < height; h++) {
            for (size_t w = 0; w < width; w++) {
                input_tensor_data[b * num_channels * width * height + c * width * height + h * width + w] = blob_image_list[b].at<Vec<float, 3>>(h, w)[c];
            }
        }
    }
}

infer_request_object.infer();

Is batch inference implemented like this? If not, can I ask you how to implement it in C++? Can you improve the detailed C++ code?

Looking forward to your reply.

akladiev commented 1 year ago

This issue will be closed in 2 weeks in case of no activity.