Inference in a batch of images.

gabrielpeixoto-cvai commented 2 years ago

Checklist

I have searched related issues but cannot get the expected help.
I have read related documents and don't know what to do.

Describe the question you meet

Well, this is my first issue, so apology for any mistake. Basically, I am trying to execute inference in a list of images. My code extracts several regions from a frame, and I want to execute inference in all images at once. Currently, I am doing a for loop and executing inference using the API for each image region, but the inference time is slow, where a single inference take 40ms, when I execute multiple inferences it goes to 170ms. So I was wondering how I can speed this up. One way I can think of is to copy the iamges to the GPU and execute inference in a batch. I do not have much experience with mmcls or pytorch, but I have experience in tensorflow, and as far as I know, if you pass a [batch_size, width, height, channels] to the model it will output an array of [batch size]. In my tests, doing in this way is much faster because there is only one copy and one inference operation. I used the "inference.py" file and inference function as a base and modified it myself, but I could not make it go faster. If anyone has any document or material for me rto refer I will be very grateful, if there are any suggestions on how to improve it, I will also be very grateful.

Post related information

The output of pip list | grep "mmcv\|mmcls\|^torch" mmcls 0.23.0 (installed from source) mmcv-full 1.5.0 torch 1.10.1 torchvision 0.11.2
Your config file if you modified it or created a new one.
```
No configuration file changes
```
Your train log file if you meet the problem during training. No train log

Other code you modified in the mmcls folder.


def inference_model(model, img):
"""Inference image(s) with the classifier.

Args:
    model (nn.Module): The loaded classifier.
    img (str/ndarray): The image filename or loaded image.

Returns:
    result (dict): The classification results that contains
        `class_name`, `pred_label` and `pred_score`.
"""
cfg = model.cfg
device = next(model.parameters()).device  # model device
# build the data pipeline
if isinstance(img, str):
    if cfg.data.test.pipeline[0]['type'] != 'LoadImageFromFile':
        cfg.data.test.pipeline.insert(0, dict(type='LoadImageFromFile'))
    data = dict(img_info=dict(filename=img), img_prefix=None)
else:
    if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
        cfg.data.test.pipeline.pop(0)
    data = dict(img=img)
test_pipeline = Compose(cfg.data.test.pipeline)
data = test_pipeline(data)
#print(f" data is container? {isinstance(data, DataContainer)}")
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
    # scatter to specified GPU
    data = scatter(data, [device])[0]

#print(data)
# forward the model
with torch.no_grad():
    scores = model(return_loss=False, **data)
    pred_score = np.max(scores, axis=1)[0]
    pred_label = np.argmax(scores, axis=1)[0]
    result = {'pred_label': pred_label, 'pred_score': float(pred_score)}
result['pred_class'] = model.CLASSES[result['pred_label']]
return result

def inference_model_array(model, img_array): """Inference image(s) with the classifier.

Args:
    model (nn.Module): The loaded classifier.
    img (str/ndarray): The image filename or loaded image.

Returns:
    result (dict): The classification results that contains
        `class_name`, `pred_label` and `pred_score`.
"""
cfg = model.cfg
#print(model.cfg)
device = next(model.parameters()).device  # model device
# build the data pipeline
if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
    cfg.data.test.pipeline.pop(0)

test_pipeline = Compose(cfg.data.test.pipeline)
data_batch = []
for i in range(len(img_array)):
    data = dict(img=img_array[i])
    #print(data)
    data = test_pipeline(data)
    data = collate([data], samples_per_gpu=len(img_array))
    #print(data)
    data_batch.append(data)

#print(f" img_array = {len(img_array)}")
#print(f" data_batch = {len(data_batch)}")
#print(f" data is container? {isinstance(data, DataContainer)}")
#print(f" data_batch is container? {isinstance(data_batch[0], DataContainer)}")
#data = collate([data_batch], samples_per_gpu=len(img_array))
#print(f"data_collated = {data_batch[0]}")
#print(data.keys())
#print(f" data = {len(data)}")
if next(model.parameters()).is_cuda:
    # scatter to specified GPU
    data = scatter([data_batch], [device])[0][0]
#print(f"data_gpu = {len(data)}")
#print(f"data_gpu = {data}")

results = []

# forward the model
for i in range(len(data)):
    #print(data[i])
    with torch.no_grad():
        scores = model(return_loss=False, **data[i])
        pred_score = np.max(scores, axis=1)[0]
        pred_label = np.argmax(scores, axis=1)[0]
        result = {'pred_label': pred_label, 'pred_score': float(pred_score)}
        #print(result)
        results.append(result)
#print(results)

for i in range(len(results)):
    results[i]['pred_class'] = model.CLASSES[results[i]['pred_label']]

return results

mzr1996 commented 2 years ago

Let me confirm some things, you want to inference a series of images, right? What's the shape of your input image_array? Is it a list of arrays with shape (H, W, C), or a single image array with shape (N, C, H, W), etc.? Can you show me your entire process to use the inference_model_array?

gabrielpeixoto-cvai commented 2 years ago

@mzr1996

Thanks for the quick answer. For example the shape fo the image_array is [7, 300, 300, 3] => [N, H, W, C], I also experimented with lists like [N, H, W, C] but in list case, each image has different shape, so I assume that the pipeline will do the transformation before input in the network, right? if that is not possible I can reshape the images before appending to the list. This is the code I am experimenting with:

from mmcls.apis import inference_model, init_model, show_result_pyplot, inference_model_array
# Copyright (c) OpenMMLab. All rights reserved.
import asyncio
from argparse import ArgumentParser

from mmdet.apis import (async_inference_detector, inference_detector,
                        init_detector, show_result_pyplot)

from camera_drivers import *
import cv2 as cv
import time
import numpy as np

detection_config = "/classification/model/path"
detection_checkpoint = "/detection/model/path"

inference_config = "/classification/config/path"
inference_checkpoint = "/detection/config/path"
device = "cuda:0"

def make_result_image_cv2(img, result, labels):
    result_bboxes = result[0]
    width, height, channels = img.shape
    result_image = img.copy()

    # print(result_bboxes)
    i = 0
    for box in result_bboxes:
        # print(box)
        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])
        if labels[i] == 0:
            cv.rectangle(result_image,(x_min,y_min),(x_max, y_max),(0,255,0),3)
        else:
            cv.rectangle(result_image,(x_min,y_min),(x_max, y_max),(0,0,255),3)
        i += 1 

    return result_image

def crop_images_cv2(img, result):
    result_bboxes = result[0]
    width, height, channels = img.shape
    result_image = img.copy()

    images = []

    for box in result_bboxes:
        # print(box)
        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])
        cropped_image = img[y_min:y_max, x_min:x_max]
        #cropped_image = cv.resize(cropped_image, (300,300))
        images.append(cropped_image)
    return images

def main():
    # build the model from a config file and a checkpoint file
    detection_model = init_detector(detection_config, detection_checkpoint, device=device)
    classification_model = init_model(inference_config, inference_checkpoint, device=device)
    serial1 = "22929840"
    cam = basler_camera.basler_camera(serial_number=serial1)

    while True:
        start = time.time()
        frame = cam.get_image()
        # test a single image
        start_detection = time.time()
        result_detection = inference_detector(detection_model, frame)
        isolated_objects = crop_images_cv2(frame, result_detection)
        end_detection = time.time()

        inference_labels = []
        #for object_single in isolated_objects:
        #    """
        #    prdiction format 
        #    result = {
        #        'pred_class': results['pred_class'][0],
        #        'pred_label': results['pred_label'][0],
        #        'pred_score': results['pred_score'][0],
        #    }
        #    """
        #    start_inference = time.time()
        #    classification_result = inference_model(classification_model, object_single)
        #    inference_labels.append(classification_result['pred_label'])
        #    end_inference = time.time()
        start_inference = time.time()
        #print(np.array(isolated_objects).shape)
        #isolated_objects = np.array(isolated_objects)
        #print(isolated_objects.shape)
        classification_result = inference_model_array(classification_model, isolated_objects)
        #classification_result = inference_model(classification_model, isolated_objects[0])
        for result in classification_result:
            inference_labels.append(result['pred_label'])
        end_inference = time.time()
        print(inference_labels)
        result_img_time_start = time.time()
        result_img = make_result_image_cv2(frame, result_detection, inference_labels)
        result_img_time_end = time.time()
        end=time.time()
        print(f"Image processing time: {end-start} seconds")
        print(f"single image classification time {end_inference-start_inference} seconds.")
        print(f"single image detection time {end_detection-start_detection} seconds.")
        print(f"create_debug_img {result_img_time_end-result_img_time_start} seconds.")
        # show the results
        #cv.imshow("Result", result_img)

        # waits for user to press any key 
        # (this is necessary to avoid Python kernel form crashing)
        #k = cv.waitKey(10) & 0xFF
        #if k == 27:
        #    break 

    # closing all open windows 
    cv.destroyAllWindows() 

if __name__ == '__main__':
    main()

as you can see in the commented part of main() I was doing inference for each image I detected.

mzr1996 commented 2 years ago

Ok, I get it. The pipeline does process the input images and crops them to the same shape (depends on your config file).

The problem is you should collate all data of a batch to one dict and scatter it to your device. And then the data will be a dict which includes a key img with shape (N, C, H, W). Pass it to the model, and the model can inference the whole batch.

I did some modifications to your function, it works but maybe not perfect.

def inference_model_array(model, img_array):
    """Inference image(s) with the classifier.

    Args:
        model (nn.Module): The loaded classifier.
        img (str/ndarray): The image filename or loaded image.

    Returns:
        result (dict): The classification results that contains
            `class_name`, `pred_label` and `pred_score`.
    """
    cfg = model.cfg
    #print(model.cfg)
    device = next(model.parameters()).device  # model device
    # build the data pipeline
    if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
        cfg.data.test.pipeline.pop(0)

    test_pipeline = Compose(cfg.data.test.pipeline)
    data_batch = []
    for i in range(len(img_array)):
        data = dict(img=img_array[i])
        #print(data)
        data = test_pipeline(data)
        #print(data)
        data_batch.append(data)

    data = collate(data_batch, samples_per_gpu=len(img_array))

    #print(f" img_array = {len(img_array)}")
    #print(f" data_batch = {len(data_batch)}")
    #print(f" data is container? {isinstance(data, DataContainer)}")
    #print(f" data_batch is container? {isinstance(data_batch[0], DataContainer)}")
    #data = collate([data_batch], samples_per_gpu=len(img_array))
    #print(f"data_collated = {data_batch[0]}")
    #print(data.keys())
    #print(f" data = {len(data)}")
    if next(model.parameters()).is_cuda:
        # scatter to specified GPU
        data = scatter(data, [device])[0]
    #print(f"data_gpu = {len(data)}")
    #print(f"data_gpu = {data}")

    results = []

    # forward the model
    #print(data[i])
    with torch.no_grad():
        scores = model(return_loss=False, **data)
        for i in range(len(scores)):
            pred_score = np.max(scores[i], axis=0)
            pred_label = np.argmax(scores[i], axis=0)
            result = {
                'pred_label': pred_label,
                'pred_score': float(pred_score),
                'pred_class': model.CLASSES[pred_label]
            }
            results.append(result)
        #print(result)
    #print(results)

    return results

gabrielpeixoto-cvai commented 2 years ago

@mzr1996 Thank you very much! I can understand the modification, I did some experiments collating the array, but it was not working properly, maybe I was adding too much []. This is probably due to my lack of experience using pytorch. Now it is taking the from 40 to 50 ms per image batch!

open-mmlab / mmpretrain