Closed largestcabbage closed 9 months ago
Hi @largestcabbage,
Yes, it will be, but depends on the device, because optional batch size may vary. Please, read about the feature in OpenVINO 2.0 release where OpenVINO is able to perform automatic batching for several devices https://docs.openvino.ai/latest/openvino_docs_OV_UG_Automatic_Batching.html So, users don't have to collect batch on application side, OV can do this internally if ov::hint::performance_mode is set to throughput (see https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html additionally)
@ilya-lavrenov Will there be a performance boost for batch inference? For cpu, different batch sizes, will fps be different? The bigger the batch, the higher the fps?
Hello @largestcabbage,
In general yes, you can expect better performance with higher batch. For example in our internal validation we have densenet-121 performance boost after increasing batch size from 1 to 8.
@andrei-kochin Can I ask how batch inference in c++ is implemented? My tests showed batch inference did not improve fps. My code is as follows:
Core core;
const Layout model_layout{ "NCHW" };
tensor_shape = model_object->input().get_shape();
tensor_shape[layout::batch_idx(model_layout)] = input_batch_size;//1,2,4,8,16
tensor_shape[layout::channels_idx(model_layout)] = input_channel_size;
tensor_shape[layout::height_idx(model_layout)] = input_height_size;
tensor_shape[layout::width_idx(model_layout)] = input_width_size;
model_object->reshape({ {model_object->input().get_any_name(), tensor_shape} });
CompiledModel compiled_model_object = core.compile_model(model_object, device_name);
infer_request_object = compiled_model_object.create_infer_request();
//fill input tensor
Tensor input_tensor = infer_request_object.get_input_tensor();
float* input_tensor_data = input_tensor.data<float>();
for (size_t b = 0; b < 1; b++) {
for (size_t c = 0; c < num_channels; c++) {
for (size_t h = 0; h < height; h++) {
for (size_t w = 0; w < width; w++) {
input_tensor_data[b * num_channels * width * height + c * width * height + h * width + w] = blob_image_list[b].at<Vec<float, 3>>(h, w)[c];
}
}
}
}
infer_request_object.infer();
Is batch inference implemented like this? If not, can I ask you how to implement it in C++? Can you improve the detailed C++ code?
Looking forward to your reply.
This issue will be closed in 2 weeks in case of no activity.
Will there be a performance boost for batch inference? Have you ever done such an experiment?