yasenh / libtorch-yolov5

A LibTorch inference implementation of the yolov5
MIT License
372 stars 114 forks source link

batch inference #23

Closed winterxx closed 3 years ago

winterxx commented 3 years ago

can not batch inference?

yasenh commented 3 years ago

Hi @winterxx, the purpose of this repo is to replicate yolov5 python detect.py. The overall flow actually supports batch inference, but you might need to build the batch and modify littit bit to support post-processing for batch inputs.

winterxx commented 3 years ago

your work is great, when i modify Pre-process batch inputs, but i have a error: terminate called after throwing an instance of 'std::runtime_error' what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/models/yolo.py", line 45, in forward _35 = (_4).forward(_34, ) _36 = (_2).forward((_3).forward(_35, ), _29, ) _37 = (_0).forward(_33, _35, (_1).forward(_36, ), )


    _38, _39, _40, _41, = _37
    return (_41, [_38, _39, _40])
  File "code/__torch__/models/yolo.py", line 77, in forward
    _54 = torch.mul(torch.add(_52, _53, alpha=1), torch.select(CONSTANTS.c6, 0, 0))
    _55 = torch.slice(y, 4, 0, 2, 1)
    _56 = torch.expand(torch.view(_54, [3, 80, 80, 2]), [1, 3, 80, 80, 2], implicit=True)
                       ~~~~~~~~~~ <--- HERE
    _57 = torch.copy_(_55, _56, False)
    _58 = torch.mul(torch.slice(y, 4, 2, 4, 1), CONSTANTS.c7)

Traceback of TorchScript, original code (most recent call last):
/media/data1/project/project2020/Detection/yolov5/yolov5/models/yolo.py(64): forward
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/media/data1/project/project2020/Detection/yolov5/yolov5/models/yolo.py(152): forward_once
/media/data1/project/project2020/Detection/yolov5/yolov5/models/yolo.py(132): forward
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/jit/__init__.py(1109): trace_module
/media/data1/3rdtools/mimiconda3/envs/yolov5/lib/python3.8/site-packages/torch/jit/__init__.py(953): trace
/media/data1/project/project2020/Detection/yolov5/yolov5/models/export.py(58): <module>
RuntimeError: shape '[3, 80, 80, 2]' is invalid for input of size 76800
yasenh commented 3 years ago

@winterxx Thanks for the feedback and it is useful info! I did some test locally and I can replicate your errors as well.

Here is the solution, you need to go back and export the torchscript model again, but with some additional parameters: https://github.com/ultralytics/yolov5/blob/master/models/export.py#L24-L25

For instance: python models/export.py --weights yolov5s.pt --img 640 --batch 2

And inference with your new model, and note that all of your input tensors need to have the same batch size (e.g. 2). So you need to comment out the warm up part, because it is a single image or you can append another empty image to it so that it becomes batch of 2 as well.

And you might also need to modify the post-processing part to deal with batch inputs. Hope it helps!

zhiqwang commented 3 years ago

There may be some another mechanism to realize the batch inference, torchvision has supplied a GeneralizedRCNNTransform class, and it can be traced by the torch.jit.script (I didn't test the torch.jit.trace mechanism).

In the new release of ultralytics, they implement a similar autoShape Class (here and here), they use OpenCV function in letterbox, so autoShape here cannot be traced by the torch.jit.trace or torch.jit.script (I guess).

BTW, GeneralizedRCNNTransform and autoShape (specially letterbox) uses different interpolation mode here, so they will produce a little different results.

yasenh commented 3 years ago

@zhiqwang Great catch! Thanks for sharing these updates, and I will try to figure if any similar implementation works here

zhiqwang commented 3 years ago

Hi @yasenh , Just as a reference, the following are my code snippets:

class YOLOWrapped(nn.Module):
    img_size = 640  # inference size (pixels)
    conf = 0.25  # NMS confidence threshold
    iou = 0.45  # NMS IoU threshold
    classes = None  # (optional list) filter by class

    def __init__(self, model, transform):
        super().__init__()
        self.model = model
        self.transform = transform

    def forward(self, x):
        batch = range(len(x))  # batch size
        original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])
        for img in x:
            val = img.shape[-2:]
            assert len(val) == 2
            original_image_sizes.append((val[0], val[1]))

        images, targets = self.transform(x, None)
        # Inference
        x = self.model(images.tensors)  # forward
        x = non_max_suppression(x[0], conf_thres=self.conf, iou_thres=self.iou, classes=self.classes)  # NMS

        # Post-process
        for i in batch:
            if x[i] is not None:
                x[i][:, :4] = scale_coords(images.image_sizes[i], x[i][:, :4], original_image_sizes[i])
        return x

And then it can be called by

from hubconf import yolov5s
from torchvision.models.detection.transform import GeneralizedRCNNTransform

min_size, max_size, image_mean, image_std = 320, 416, [0, 0, 0], [1, 1, 1]
transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

model = yolov5s(pretrained=True).fuse().eval()  # yolov5s.pt

yolo_wrapped = YOLOWrapped(model, transform)

The yolo_wrapped can be used to (batch) infer as below

def image_process(img, is_half=False):
    img = np.ascontiguousarray(img, dtype=np.float32)  # uint8 to float32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    img = img.transpose([2, 0, 1])
    img = torch.from_numpy(img)
    img = img.half() if is_half else img.float()  # uint8 to fp16/32

    return img

img = cv2.imread('./bus.jpg')
img_test = image_process(img, is_half=False)

# Inference
prediction = yolo_wrapped([img_test])  # includes NMS

# Batch inference
prediction = yolo_wrapped([img_test0, img_test1, img_test2])  # includes NMS

I think the YOLOWrapped here can be traced using torch.jit.trace in export.py as ultralytics supplied (but I did't test it).

zhiqwang commented 3 years ago

And I test another similar mechanism in my own repo, its inference time seems to be close to your implementation here.