open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.37k stars 1.05k forks source link

Inference in a batch of images. #855

Closed gabrielpeixoto-cvai closed 2 years ago

gabrielpeixoto-cvai commented 2 years ago

Checklist

Describe the question you meet

Well, this is my first issue, so apology for any mistake. Basically, I am trying to execute inference in a list of images. My code extracts several regions from a frame, and I want to execute inference in all images at once. Currently, I am doing a for loop and executing inference using the API for each image region, but the inference time is slow, where a single inference take 40ms, when I execute multiple inferences it goes to 170ms. So I was wondering how I can speed this up. One way I can think of is to copy the iamges to the GPU and execute inference in a batch. I do not have much experience with mmcls or pytorch, but I have experience in tensorflow, and as far as I know, if you pass a [batch_size, width, height, channels] to the model it will output an array of [batch size]. In my tests, doing in this way is much faster because there is only one copy and one inference operation. I used the "inference.py" file and inference function as a base and modified it myself, but I could not make it go faster. If anyone has any document or material for me rto refer I will be very grateful, if there are any suggestions on how to improve it, I will also be very grateful.

Post related information

  1. The output of pip list | grep "mmcv\|mmcls\|^torch" mmcls 0.23.0 (installed from source) mmcv-full 1.5.0 torch 1.10.1 torchvision 0.11.2
  2. Your config file if you modified it or created a new one.
    No configuration file changes
  3. Your train log file if you meet the problem during training. No train log
  4. Other code you modified in the mmcls folder.

    
    def inference_model(model, img):
    """Inference image(s) with the classifier.
    
    Args:
        model (nn.Module): The loaded classifier.
        img (str/ndarray): The image filename or loaded image.
    
    Returns:
        result (dict): The classification results that contains
            `class_name`, `pred_label` and `pred_score`.
    """
    cfg = model.cfg
    device = next(model.parameters()).device  # model device
    # build the data pipeline
    if isinstance(img, str):
        if cfg.data.test.pipeline[0]['type'] != 'LoadImageFromFile':
            cfg.data.test.pipeline.insert(0, dict(type='LoadImageFromFile'))
        data = dict(img_info=dict(filename=img), img_prefix=None)
    else:
        if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
            cfg.data.test.pipeline.pop(0)
        data = dict(img=img)
    test_pipeline = Compose(cfg.data.test.pipeline)
    data = test_pipeline(data)
    #print(f" data is container? {isinstance(data, DataContainer)}")
    data = collate([data], samples_per_gpu=1)
    if next(model.parameters()).is_cuda:
        # scatter to specified GPU
        data = scatter(data, [device])[0]
    
    #print(data)
    # forward the model
    with torch.no_grad():
        scores = model(return_loss=False, **data)
        pred_score = np.max(scores, axis=1)[0]
        pred_label = np.argmax(scores, axis=1)[0]
        result = {'pred_label': pred_label, 'pred_score': float(pred_score)}
    result['pred_class'] = model.CLASSES[result['pred_label']]
    return result

def inference_model_array(model, img_array): """Inference image(s) with the classifier.

Args:
    model (nn.Module): The loaded classifier.
    img (str/ndarray): The image filename or loaded image.

Returns:
    result (dict): The classification results that contains
        `class_name`, `pred_label` and `pred_score`.
"""
cfg = model.cfg
#print(model.cfg)
device = next(model.parameters()).device  # model device
# build the data pipeline
if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
    cfg.data.test.pipeline.pop(0)

test_pipeline = Compose(cfg.data.test.pipeline)
data_batch = []
for i in range(len(img_array)):
    data = dict(img=img_array[i])
    #print(data)
    data = test_pipeline(data)
    data = collate([data], samples_per_gpu=len(img_array))
    #print(data)
    data_batch.append(data)

#print(f" img_array = {len(img_array)}")
#print(f" data_batch = {len(data_batch)}")
#print(f" data is container? {isinstance(data, DataContainer)}")
#print(f" data_batch is container? {isinstance(data_batch[0], DataContainer)}")
#data = collate([data_batch], samples_per_gpu=len(img_array))
#print(f"data_collated = {data_batch[0]}")
#print(data.keys())
#print(f" data = {len(data)}")
if next(model.parameters()).is_cuda:
    # scatter to specified GPU
    data = scatter([data_batch], [device])[0][0]
#print(f"data_gpu = {len(data)}")
#print(f"data_gpu = {data}")

results = []

# forward the model
for i in range(len(data)):
    #print(data[i])
    with torch.no_grad():
        scores = model(return_loss=False, **data[i])
        pred_score = np.max(scores, axis=1)[0]
        pred_label = np.argmax(scores, axis=1)[0]
        result = {'pred_label': pred_label, 'pred_score': float(pred_score)}
        #print(result)
        results.append(result)
#print(results)

for i in range(len(results)):
    results[i]['pred_class'] = model.CLASSES[results[i]['pred_label']]

return results
mzr1996 commented 2 years ago

Let me confirm some things, you want to inference a series of images, right? What's the shape of your input image_array? Is it a list of arrays with shape (H, W, C), or a single image array with shape (N, C, H, W), etc.? Can you show me your entire process to use the inference_model_array?

gabrielpeixoto-cvai commented 2 years ago

@mzr1996

Thanks for the quick answer. For example the shape fo the image_array is [7, 300, 300, 3] => [N, H, W, C], I also experimented with lists like [N, H, W, C] but in list case, each image has different shape, so I assume that the pipeline will do the transformation before input in the network, right? if that is not possible I can reshape the images before appending to the list. This is the code I am experimenting with:

from mmcls.apis import inference_model, init_model, show_result_pyplot, inference_model_array
# Copyright (c) OpenMMLab. All rights reserved.
import asyncio
from argparse import ArgumentParser

from mmdet.apis import (async_inference_detector, inference_detector,
                        init_detector, show_result_pyplot)

from camera_drivers import *
import cv2 as cv
import time
import numpy as np

detection_config = "/classification/model/path"
detection_checkpoint = "/detection/model/path"

inference_config = "/classification/config/path"
inference_checkpoint = "/detection/config/path"
device = "cuda:0"

def make_result_image_cv2(img, result, labels):
    result_bboxes = result[0]
    width, height, channels = img.shape
    result_image = img.copy()

    # print(result_bboxes)
    i = 0
    for box in result_bboxes:
        # print(box)
        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])
        if labels[i] == 0:
            cv.rectangle(result_image,(x_min,y_min),(x_max, y_max),(0,255,0),3)
        else:
            cv.rectangle(result_image,(x_min,y_min),(x_max, y_max),(0,0,255),3)
        i += 1 

    return result_image

def crop_images_cv2(img, result):
    result_bboxes = result[0]
    width, height, channels = img.shape
    result_image = img.copy()

    images = []

    for box in result_bboxes:
        # print(box)
        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])
        cropped_image = img[y_min:y_max, x_min:x_max]
        #cropped_image = cv.resize(cropped_image, (300,300))
        images.append(cropped_image)
    return images

def main():
    # build the model from a config file and a checkpoint file
    detection_model = init_detector(detection_config, detection_checkpoint, device=device)
    classification_model = init_model(inference_config, inference_checkpoint, device=device)
    serial1 = "22929840"
    cam = basler_camera.basler_camera(serial_number=serial1)

    while True:
        start = time.time()
        frame = cam.get_image()
        # test a single image
        start_detection = time.time()
        result_detection = inference_detector(detection_model, frame)
        isolated_objects = crop_images_cv2(frame, result_detection)
        end_detection = time.time()

        inference_labels = []
        #for object_single in isolated_objects:
        #    """
        #    prdiction format 
        #    result = {
        #        'pred_class': results['pred_class'][0],
        #        'pred_label': results['pred_label'][0],
        #        'pred_score': results['pred_score'][0],
        #    }
        #    """
        #    start_inference = time.time()
        #    classification_result = inference_model(classification_model, object_single)
        #    inference_labels.append(classification_result['pred_label'])
        #    end_inference = time.time()
        start_inference = time.time()
        #print(np.array(isolated_objects).shape)
        #isolated_objects = np.array(isolated_objects)
        #print(isolated_objects.shape)
        classification_result = inference_model_array(classification_model, isolated_objects)
        #classification_result = inference_model(classification_model, isolated_objects[0])
        for result in classification_result:
            inference_labels.append(result['pred_label'])
        end_inference = time.time()
        print(inference_labels)
        result_img_time_start = time.time()
        result_img = make_result_image_cv2(frame, result_detection, inference_labels)
        result_img_time_end = time.time()
        end=time.time()
        print(f"Image processing time: {end-start} seconds")
        print(f"single image classification time {end_inference-start_inference} seconds.")
        print(f"single image detection time {end_detection-start_detection} seconds.")
        print(f"create_debug_img {result_img_time_end-result_img_time_start} seconds.")
        # show the results
        #cv.imshow("Result", result_img)

        # waits for user to press any key 
        # (this is necessary to avoid Python kernel form crashing)
        #k = cv.waitKey(10) & 0xFF
        #if k == 27:
        #    break 

    # closing all open windows 
    cv.destroyAllWindows() 

if __name__ == '__main__':
    main()

as you can see in the commented part of main() I was doing inference for each image I detected.

mzr1996 commented 2 years ago

Ok, I get it. The pipeline does process the input images and crops them to the same shape (depends on your config file).

The problem is you should collate all data of a batch to one dict and scatter it to your device. And then the data will be a dict which includes a key img with shape (N, C, H, W). Pass it to the model, and the model can inference the whole batch.

I did some modifications to your function, it works but maybe not perfect.

def inference_model_array(model, img_array):
    """Inference image(s) with the classifier.

    Args:
        model (nn.Module): The loaded classifier.
        img (str/ndarray): The image filename or loaded image.

    Returns:
        result (dict): The classification results that contains
            `class_name`, `pred_label` and `pred_score`.
    """
    cfg = model.cfg
    #print(model.cfg)
    device = next(model.parameters()).device  # model device
    # build the data pipeline
    if cfg.data.test.pipeline[0]['type'] == 'LoadImageFromFile':
        cfg.data.test.pipeline.pop(0)

    test_pipeline = Compose(cfg.data.test.pipeline)
    data_batch = []
    for i in range(len(img_array)):
        data = dict(img=img_array[i])
        #print(data)
        data = test_pipeline(data)
        #print(data)
        data_batch.append(data)

    data = collate(data_batch, samples_per_gpu=len(img_array))

    #print(f" img_array = {len(img_array)}")
    #print(f" data_batch = {len(data_batch)}")
    #print(f" data is container? {isinstance(data, DataContainer)}")
    #print(f" data_batch is container? {isinstance(data_batch[0], DataContainer)}")
    #data = collate([data_batch], samples_per_gpu=len(img_array))
    #print(f"data_collated = {data_batch[0]}")
    #print(data.keys())
    #print(f" data = {len(data)}")
    if next(model.parameters()).is_cuda:
        # scatter to specified GPU
        data = scatter(data, [device])[0]
    #print(f"data_gpu = {len(data)}")
    #print(f"data_gpu = {data}")

    results = []

    # forward the model
    #print(data[i])
    with torch.no_grad():
        scores = model(return_loss=False, **data)
        for i in range(len(scores)):
            pred_score = np.max(scores[i], axis=0)
            pred_label = np.argmax(scores[i], axis=0)
            result = {
                'pred_label': pred_label,
                'pred_score': float(pred_score),
                'pred_class': model.CLASSES[pred_label]
            }
            results.append(result)
        #print(result)
    #print(results)

    return results
gabrielpeixoto-cvai commented 2 years ago

@mzr1996 Thank you very much! I can understand the modification, I did some experiments collating the array, but it was not working properly, maybe I was adding too much []. This is probably due to my lack of experience using pytorch. Now it is taking the from 40 to 50 ms per image batch!