zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

Specs to speed up FPS #192

Open juanmanuelrq opened 4 years ago

juanmanuelrq commented 4 years ago

Hi,

I am getting 8.6 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = False

I am getting 9.4 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True

I am getting 1.5 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True, force_input_size = 1920

My hardware specs are, GeForce GTX 690M with 4096 MB, and 16 RAM

what specs hardware or software changes, do I neet to improve to speed up my FPS?

best regards

frame_screenshot_23 04 2020

zylo117 commented 4 years ago

As I mentioned on readme, I tested on RTX2080Ti, which is about 5 to 6 times faster than 690M. Also, notebook graphics card always gets overheat and force to reduce power. And 4g vram is very limited too. You can possibly run at batchsize 2. It would be a little bit faster though.

So it'd be better if you can run on a faster GPU with larger memory.

juanmanuelrq commented 4 years ago

Hi,

I am run this code with a tesla k80, and the results are Fps: 7.716913 img_shape (720, 1280, 3)

why the test code file efficientdet_test.py with the same k80 GPU get 14.66 FPS with a image of 1920x1080 pixels result with batch size of 1 0.06818311214447022 seconds, 14.66638832620523 FPS, @batch_size 1

batch size of 32 1.06377272605896 seconds, 30.08161350268195 FPS, @batch_size 32

how can I get a batch size of 32 with the video code:

this is the code

Core Author: Zylo117

Script's Author: winter2897

""" Simple Inference Script of EfficientDet-Pytorch for detecting objects on webcam """ import time import torch import cv2 import numpy as np from torch.backends import cudnn from backbone import EfficientDetBackbone from efficientdet.utils import BBoxTransform, ClipBoxes from utils.utils import preprocess, invert_affine, postprocess, preprocess_video

Video's path

video_src = '/home/sistemasinteligentesmaps/Downloads/20200306_20200306092821_20200306092901_094440.mp4'

width = 1920 height = 1080

compound_coef = 0 force_input_size = None # set None to use default size

threshold = 0.2 iou_threshold = 0.2

use_cuda = True use_float16 = False cudnn.fastest = True cudnn.benchmark = True

obj_list = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', '', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', '', 'backpack', 'umbrella', '', '', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', '', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', '', 'dining table', '', '', 'toilet', '', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', '', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

tf bilinear interpolation is different from any other's, just make do

input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536] input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size

load model

model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list))

model.load_state_dict(torch.load(f'weights/efficientdet-d{compound_coef}.pth'))

model.load_state_dict(torch.load('weights/efficientdet-d'+str(compound_coef)+'.pth')) model.requiresgrad(False) model.eval()

if use_cuda: model = model.cuda() if use_float16: model = model.half()

function for display

def display(preds, imgs): for i in range(len(imgs)): if len(preds[i]['rois']) == 0: continue

    for j in range(len(preds[i]['rois'])):
        (x1, y1, x2, y2) = preds[i]['rois'][j].astype(np.int)
        cv2.rectangle(imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
        obj = obj_list[preds[i]['class_ids'][j]]
        score = float(preds[i]['scores'][j])

        cv2.putText(imgs[i], '{}, {:.3f}'.format(obj, score),
                    (x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (255, 255, 0), 1)

    return imgs[i]

Box

regressBoxes = BBoxTransform() clipBoxes = ClipBoxes()

Video capture

cap = cv2.VideoCapture(video_src)

while True: start = time.time() ret, frame = cap.read()

dim = (width, height)
#frame = cv2.resize(frame, dim, interpolation = cv2.INTER_AREA)
if not ret:
    break

# frame preprocessing
ori_imgs, framed_imgs, framed_metas = preprocess_video(frame, max_size=input_size)

if use_cuda:
    x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
else:
    x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)

x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)

# model predict
with torch.no_grad():
    features, regression, classification, anchors = model(x)

    out = postprocess(x,
                    anchors, regression, classification,
                    regressBoxes, clipBoxes,
                    threshold, iou_threshold)

# result
out = invert_affine(framed_metas, out)
img_show = display(out, ori_imgs)

end = time.time()
print("Fps: %f" % (1.0 / (end - start)),"img_shape",img_show.shape)

# show frame by frame
cv2.imshow('frame',img_show)
if cv2.waitKey(1) & 0xFF == ord('q'): 
    break

cap.release() cv2.destroyAllWindows()

juanmanuelrq commented 4 years ago

You can possibly run at batchsize 2. It would be a little bit faster though

Hi, how can I do that? in video You can possibly run at batchsize 2. It would be a little bit faster though

zylo117 commented 4 years ago

You can try to implement a buffer mechanism, i.e. a queue, loading images until it's full and when it's full, pop them all out and then run down frame preprocessing.

juanmanuelrq commented 4 years ago

Thank you @zylo117 I am going to try to do it!

ntdat017 commented 3 years ago

Hi,

I am getting 8.6 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = False

I am getting 9.4 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True

I am getting 1.5 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True, force_input_size = 1920

My hardware specs are, GeForce GTX 690M with 4096 MB, and 16 RAM

what specs hardware or software changes, do I neet to improve to speed up my FPS?

best regards

frame_screenshot_23 04 2020

hi @juanmanuelrq, Did you have trained model with your custom dataset or just use the pre-trained? I have train with large images like your image but the results is not good, mAP have only ~ 0.010.

juanmanuelrq commented 3 years ago

Hi @ntdat017 , I have trained visdrone dataset task-1 https://github.com/VisDrone/VisDrone-Dataset#task-1-object-detection-in-but in this image I used the pre-trained efficientdet-d0.pth of https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch

zylo117 commented 3 years ago

@ntdat017 try increasing lr to speedup convergence