Open juanmanuelrq opened 4 years ago
As I mentioned on readme, I tested on RTX2080Ti, which is about 5 to 6 times faster than 690M. Also, notebook graphics card always gets overheat and force to reduce power. And 4g vram is very limited too. You can possibly run at batchsize 2. It would be a little bit faster though.
So it'd be better if you can run on a faster GPU with larger memory.
Hi,
I am run this code with a tesla k80, and the results are Fps: 7.716913 img_shape (720, 1280, 3)
why the test code file efficientdet_test.py with the same k80 GPU get 14.66 FPS with a image of 1920x1080 pixels result with batch size of 1 0.06818311214447022 seconds, 14.66638832620523 FPS, @batch_size 1
batch size of 32 1.06377272605896 seconds, 30.08161350268195 FPS, @batch_size 32
how can I get a batch size of 32 with the video code:
this is the code
""" Simple Inference Script of EfficientDet-Pytorch for detecting objects on webcam """ import time import torch import cv2 import numpy as np from torch.backends import cudnn from backbone import EfficientDetBackbone from efficientdet.utils import BBoxTransform, ClipBoxes from utils.utils import preprocess, invert_affine, postprocess, preprocess_video
video_src = '/home/sistemasinteligentesmaps/Downloads/20200306_20200306092821_20200306092901_094440.mp4'
width = 1920 height = 1080
compound_coef = 0 force_input_size = None # set None to use default size
threshold = 0.2 iou_threshold = 0.2
use_cuda = True use_float16 = False cudnn.fastest = True cudnn.benchmark = True
obj_list = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', '', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', '', 'backpack', 'umbrella', '', '', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', '', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', '', 'dining table', '', '', 'toilet', '', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', '', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536] input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list))
model.load_state_dict(torch.load('weights/efficientdet-d'+str(compound_coef)+'.pth')) model.requiresgrad(False) model.eval()
if use_cuda: model = model.cuda() if use_float16: model = model.half()
def display(preds, imgs): for i in range(len(imgs)): if len(preds[i]['rois']) == 0: continue
for j in range(len(preds[i]['rois'])):
(x1, y1, x2, y2) = preds[i]['rois'][j].astype(np.int)
cv2.rectangle(imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
obj = obj_list[preds[i]['class_ids'][j]]
score = float(preds[i]['scores'][j])
cv2.putText(imgs[i], '{}, {:.3f}'.format(obj, score),
(x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(255, 255, 0), 1)
return imgs[i]
regressBoxes = BBoxTransform() clipBoxes = ClipBoxes()
cap = cv2.VideoCapture(video_src)
while True: start = time.time() ret, frame = cap.read()
dim = (width, height)
#frame = cv2.resize(frame, dim, interpolation = cv2.INTER_AREA)
if not ret:
break
# frame preprocessing
ori_imgs, framed_imgs, framed_metas = preprocess_video(frame, max_size=input_size)
if use_cuda:
x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
else:
x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)
# model predict
with torch.no_grad():
features, regression, classification, anchors = model(x)
out = postprocess(x,
anchors, regression, classification,
regressBoxes, clipBoxes,
threshold, iou_threshold)
# result
out = invert_affine(framed_metas, out)
img_show = display(out, ori_imgs)
end = time.time()
print("Fps: %f" % (1.0 / (end - start)),"img_shape",img_show.shape)
# show frame by frame
cv2.imshow('frame',img_show)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release() cv2.destroyAllWindows()
You can possibly run at batchsize 2. It would be a little bit faster though
Hi, how can I do that? in video You can possibly run at batchsize 2. It would be a little bit faster though
You can try to implement a buffer mechanism, i.e. a queue, loading images until it's full and when it's full, pop them all out and then run down frame preprocessing.
Thank you @zylo117 I am going to try to do it!
Hi,
I am getting 8.6 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = False
I am getting 9.4 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True
I am getting 1.5 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True, force_input_size = 1920
My hardware specs are, GeForce GTX 690M with 4096 MB, and 16 RAM
what specs hardware or software changes, do I neet to improve to speed up my FPS?
best regards
hi @juanmanuelrq, Did you have trained model with your custom dataset or just use the pre-trained? I have train with large images like your image but the results is not good, mAP have only ~ 0.010.
Hi @ntdat017 , I have trained visdrone dataset task-1 https://github.com/VisDrone/VisDrone-Dataset#task-1-object-detection-in-but in this image I used the pre-trained efficientdet-d0.pth of https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch
@ntdat017 try increasing lr to speedup convergence
Hi,
I am getting 8.6 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = False
I am getting 9.4 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True
I am getting 1.5 FPS in 2K video (width = 1920, height = 1080) image size, and efficientdet-d0.pth and use_float16 = True, force_input_size = 1920
My hardware specs are, GeForce GTX 690M with 4096 MB, and 16 RAM
what specs hardware or software changes, do I neet to improve to speed up my FPS?
best regards