Inference speed - Githubissues

ashnair1 commented 5 years ago

I'm trying to optimise the inference time when dealing with a set of images. Using the main_test.py to process 5 images, takes about 15 seconds.

Performing inference for scale: (1400, 2000)
Tester: 1/5, Detection: 0.4927s, Post Processing: 0.001314s
Tester: 2/5, Detection: 0.4695s, Post Processing: 0.001933s
Tester: 3/5, Detection: 0.4628s, Post Processing: 0.001779s
Tester: 4/5, Detection: 0.4596s, Post Processing: 0.001778s
Tester: 5/5, Detection: 0.4572s, Post Processing: 0.001732s
Performing inference for scale: (800, 1280)
Tester: 1/5, Detection: 0.3368s, Post Processing: 0.001827s
Tester: 2/5, Detection: 0.3423s, Post Processing: 0.001567s
Tester: 3/5, Detection: 0.3433s, Post Processing: 0.001474s
Tester: 4/5, Detection: 0.3426s, Post Processing: 0.00151s
Tester: 5/5, Detection: 0.3426s, Post Processing: 0.001458s
Performing inference for scale: (480, 512)
Tester: 1/5, Detection: 0.3128s, Post Processing: 0.0008581s
Tester: 2/5, Detection: 0.3020s, Post Processing: 0.001107s
Tester: 3/5, Detection: 0.2989s, Post Processing: 0.001178s
Tester: 4/5, Detection: 0.2966s, Post Processing: 0.001154s
Tester: 5/5, Detection: 0.2955s, Post Processing: 0.001106s
Aggregating detections from multiple scales and applying NMS...
/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
Done! Saving detections into: data/demo/batch_results/dets_final/detections.pkl
All done!

real    0m16.401s
user    0m35.516s
sys 0m12.016s

Is this the maximum speed attainable on a single GPU? For reference: I'm currently using a 12 GB Nvidia Titan XP. Do you have any suggestions in improving inference time for a batch of images? Is splitting the images between multiple GPUs the only way to speed up? If so, how can I go about doing that?

Update: The paper states that on a single Tesla V100 GPU you were able run inference on 5 images per second. I wanted to try this out for myself and the following is my result.

Performing inference for scale: (1400, 2000)
Tester: 1/5, Detection: 0.2198s, Post Processing: 0.000953s
Tester: 2/5, Detection: 0.2096s, Post Processing: 0.001023s
Tester: 3/5, Detection: 0.2048s, Post Processing: 0.001074s
Tester: 4/5, Detection: 0.2018s, Post Processing: 0.001131s
Tester: 5/5, Detection: 0.2005s, Post Processing: 0.001157s
Performing inference for scale: (800, 1280)
Tester: 1/5, Detection: 0.1777s, Post Processing: 0.0007889s
Tester: 2/5, Detection: 0.1901s, Post Processing: 0.000767s
Tester: 3/5, Detection: 0.2109s, Post Processing: 0.0007869s
Tester: 4/5, Detection: 0.1966s, Post Processing: 0.0008252s
Tester: 5/5, Detection: 0.1847s, Post Processing: 0.0008336s
Performing inference for scale: (480, 512)
Tester: 1/5, Detection: 0.1269s, Post Processing: 0.000644s
Tester: 2/5, Detection: 0.1210s, Post Processing: 0.0005785s
Tester: 3/5, Detection: 0.1196s, Post Processing: 0.0005847s
Tester: 4/5, Detection: 0.1211s, Post Processing: 0.0007268s
Tester: 5/5, Detection: 0.1220s, Post Processing: 0.0008238s
Aggregating detections from multiple scales and applying NMS...
/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
Done! Saving detections into: data/demo/batch_results/dets_final/detections.pkl
All done!

real    0m17.451s
user    1m12.805s
sys 0m18.422s

As you can see while there is some speedup, the overall process still takes about the same amount of time. Is this the maximum speed at which it can infer or are there additional modifications that could be made to make it faster?

You had mentioned in the paper that you had run multiple processes in parallel during inference. Could you tell me how that was done?

Roujack commented 5 years ago

@ash1995 can you tell me how to do inference on a set of images using main_test.py(show me the code)? 3s per image is acceptable for me now. On my test on NVIDIA 1080Ti, Inference per image needs about 10s.

ashnair1 commented 5 years ago

Here you go,

# --------------------------------------------------------------
# SNIPER: Efficient Multi-Scale Training
# Licensed under The Apache-2.0 License [see LICENSE for details]
# Inference Module
# by Mahyar Najibi and Bharat Singh
# --------------------------------------------------------------
import init
import matplotlib
matplotlib.use('Agg')
from symbols.faster import *
from configs.faster.default_configs import config, update_config, update_config_from_list
from data_utils.load_data import load_proposal_roidb
import mxnet as mx
import argparse
from train_utils.utils import create_logger, load_param
from inference import imdb_detection_wrapper
from inference import imdb_proposal_extraction_wrapper
import os
import time
from PIL import Image
from easydict import EasyDict

os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'

def parser():
    arg_parser = argparse.ArgumentParser('SNIPER test module')
    arg_parser.add_argument('--cfg', dest='cfg', help='Path to the config file',
                    default='configs/faster/sniper_res101_e2e.yml',type=str)
    arg_parser.add_argument('--save_prefix', dest='save_prefix', help='Prefix used for snapshotting the network',
                            default='SNIPER', type=str)
    arg_parser.add_argument('--img_dir_path', dest='img_dir_path', help='Path to the image directory', type=str,
                            default='data/demo_batch/images')
    arg_parser.add_argument('--dataset', dest='dataset', help='choose categories belong to which dataset', type=str,
                            default='coco')
    arg_parser.add_argument('--vis', dest='vis', help='Whether to visualize the detections',
                            action='store_true')
    arg_parser.add_argument('--set', dest='set_cfg_list', help='Set the configuration fields from command line',
                            default=None, nargs=argparse.REMAINDER)
    return arg_parser.parse_args()

def main():
    args = parser()
    update_config(args.cfg)
    if args.set_cfg_list:
        update_config_from_list(args.set_cfg_list)

    context = [mx.gpu(int(gpu)) for gpu in config.gpus.split(',')]

    if not os.path.isdir(config.output_path):
        os.mkdir(config.output_path)

    coco_names = [u'BG', u'person', u'bicycle', u'car', u'motorcycle', u'airplane',
                       u'bus', u'train', u'truck', u'boat', u'traffic light', u'fire hydrant',
                       u'stop sign', u'parking meter', u'bench', u'bird', u'cat', u'dog', u'horse', u'sheep', u'cow',
                       u'elephant', u'bear', u'zebra', u'giraffe', u'backpack', u'umbrella', u'handbag', u'tie',
                       u'suitcase', u'frisbee', u'skis', u'snowboard', u'sports\nball', u'kite', u'baseball\nbat',
                       u'baseball glove', u'skateboard', u'surfboard', u'tennis racket', u'bottle', u'wine\nglass',
                       u'cup', u'fork', u'knife', u'spoon', u'bowl', u'banana', u'apple', u'sandwich', u'orange',
                       u'broccoli', u'carrot', u'hot dog', u'pizza', u'donut', u'cake', u'chair', u'couch',
                       u'potted plant', u'bed', u'dining table', u'toilet', u'tv', u'laptop', u'mouse', u'remote',
                       u'keyboard', u'cell phone', u'microwave', u'oven', u'toaster', u'sink', u'refrigerator', u'book',
                       u'clock', u'vase', u'scissors', u'teddy bear', u'hair\ndrier', u'toothbrush']

    roidb = []

    for img in os.listdir(args.img_dir_path):

        start = time.time()

        im_path = os.path.join(args.img_dir_path,img)

        # Get image dimensions
        width, height = Image.open(im_path).size

        # Pack image info
        r = {'image': im_path, 'width': width, 'height': height, 'flipped': False}
        roidb.append(r)

    # Pack db info
    db_info = EasyDict()
    db_info.name = 'satellite'
    db_info.result_path = 'data/demo/batch_results'

    db_info.classes = satellite_names

    db_info.num_classes = len(db_info.classes)

    imdb = db_info

    # Creating the Logger
    logger, output_path = create_logger(config.output_path, args.cfg, config.dataset.image_set)
    model_prefix = os.path.join(output_path, args.save_prefix)
    arg_params, aux_params = load_param(model_prefix, config.TEST.TEST_EPOCH,
                                        convert=True, process=True)

    sym_inst = eval('{}.{}'.format(config.symbol, config.symbol))
    imdb_detection_wrapper(sym_inst, config, imdb, roidb, context, arg_params, aux_params, args.vis)

if __name__ == '__main__':
    main()

Roujack commented 5 years ago

@ash1995 many thanks!

wxianfeng commented 5 years ago

me too

GPU M40 picture size: 100K

inference time cost 4s, very slowly, how to speed up it ?

HeyMrYu commented 5 years ago

@ash1995 hello, I have a question when i run the main_test.py.

Tester: 10108/10144, Detection: 0.1189s, Post Processing: 0.008175s Tester: 10112/10144, Detection: 0.1189s, Post Processing: 0.008174s Tester: 10116/10144, Detection: 0.1189s, Post Processing: 0.008172s Tester: 10120/10144, Detection: 0.1189s, Post Processing: 0.008169s Tester: 10124/10144, Detection: 0.1188s, Post Processing: 0.008167s Tester: 10128/10144, Detection: 0.1188s, Post Processing: 0.008167s Tester: 10132/10144, Detection: 0.1188s, Post Processing: 0.008166s Tester: 10136/10144, Detection: 0.1188s, Post Processing: 0.008163s Tester: 10140/10144, Detection: 0.1188s, Post Processing: 0.008162s Tester: 10144/10144, Detection: 0.1188s, Post Processing: 0.00816s

It stops when it gets there, I can't got the result. As you can see this question? thanks very much!

saksham-s commented 5 years ago

@ashnair1 @Roujack @HeyMrYu were you able to use the main_test in the way @ashnair1 said. I get the following error traceback: Traceback (most recent call last): File "tester.py", line 115, in main() File "tester.py", line 112, in main imdb_detection_wrapper(sym_inst, config, imdb, roidb, context, arg_params, aux_params, args.vis) File "lib/inference.py", line 353, in imdb_detection_wrapper roidb, imdb, arg_params, aux_params, vis])) File "lib/inference.py", line 328, in detect_scale_worker pad_rois_to=400, crop_size=None, test_scale=scale) File "lib/iterators/MNIteratorTest.py", line 11, in init self.num_classes = num_classes if num_classes else roidb[0]['gt_overlaps'].shape[1] KeyError: 'gt_overlaps'

ten1er commented 4 years ago

@ashnair1 @Roujack @HeyMrYu were you able to use the main_test in the way @ashnair1 said. I get the following error traceback: Traceback (most recent call last): File "tester.py", line 115, in main() File "tester.py", line 112, in main imdb_detection_wrapper(sym_inst, config, imdb, roidb, context, arg_params, aux_params, args.vis) File "lib/inference.py", line 353, in imdb_detection_wrapper roidb, imdb, arg_params, aux_params, vis])) File "lib/inference.py", line 328, in detect_scale_worker pad_rois_to=400, crop_size=None, test_scale=scale) File "lib/iterators/MNIteratorTest.py", line 11, in init self.num_classes = num_classes if num_classes else roidb[0]['gt_overlaps'].shape[1] KeyError: 'gt_overlaps'

hi, have you sloved the problem, I meet this too.

dinhtungtp commented 3 years ago

@ten1er @saksham-s You have to change

test_iter = MNIteratorTestAutoFocus(roidb=roidb, config=config, batch_size=nGPUs * nbatch, nGPUs=nGPUs, threads=32,
                               pad_rois_to=400, crop_size=None, test_scale=scale)

to

test_iter = MNIteratorTestAutoFocus(roidb=roidb, config=config, batch_size=nGPUs * nbatch, nGPUs=nGPUs, threads=32,
                               pad_rois_to=400, crop_size=None, test_scale=scale, num_classes=config.dataset.NUM_CLASSES)

mahyarnajibi / SNIPER

Inference speed #104