Closed jstumpin closed 1 year ago
Increment value was wrong:
std::vector<std::vector<BoundingBox>> SampleYolo::get_bboxes(int batch_size, int output_size
int *num_detections, float *nmsed_boxes, float *nmsed_scores, int *nmsed_classes)
{
int detect_pos = 0;
int box_pos = 0;
std::vector<std::vector<BoundingBox>> bboxes {static_cast<size_t>(batch_size)};
for (int b = 0; b < batch_size; ++b)
{
for (int t = 0; t < num_detections[b]; ++t)
{
int box_coord_pos = box_pos + 4 * t;
float x1 = nmsed_boxes[box_coord_pos];
float y1 = nmsed_boxes[box_coord_pos + 1];
float x2 = nmsed_boxes[box_coord_pos + 2];
float y2 = nmsed_boxes[box_coord_pos + 3];
bboxes[b].push_back(BoundingBox {
std::min(x1, x2),
std::min(y1, y2),
std::max(x1, x2),
std::max(y1, y2),
nmsed_scores[detect_pos + t],
static_cast<int>(nmsed_classes[detect_pos + t]) });
}
detect_pos += output_size; //rectified increment
box_pos += output_size * 4; //rectified increment
}
return bboxes;
}
where output_size
(reference: https://github.com/NVIDIA/TensorRT/blob/release/8.6/samples/common/buffers.h#L313-L319):
int index = mEngine->getBindingIndex("detection_classes");
output_size = mManagedBuffers[index]->hostBuffer.size() / batch_size;
Indeed, the new optimized NMS is able to shave a good chunk of milliseconds off my batch inference (batch_size = 8
, 512x512, YOLOv4x). Keep up the good work, thanks! @marcoslucianops
The output of the YOLO model on this repo is adjusted to get more performance on DeepStream. It's not equal to other implementations.
Single inference is working but batch inference is failing (only the first instance is successful) when using this commit: New optimized NMS. I've been using this code snippet to perform decoding and it's been working prior to the said commit: https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/tensorrt_yolov4/source/SampleYolo.cpp#L804-L845:
Adapting to the said commit brings me up to this:
Size of
num_detections
is correct, i.e. =batch_size
so onlynum_detections
is holding the right values.nmsed_boxes
,nmsed_scores
andnmsed_classes
do not hold any value beyondnum_detections[0]
, e.g.nmsed_classes[num_detections[0] - 1]
is correct butnmsed_classes[num_detections[1] - 1]
isNULL
.Not using DeepStream, hence why I'm using NVIDIA's standalone version in the inference front. Merging with your repo for a wider YOLO variant support. Anything else that I missed @marcoslucianops?