xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.23k stars 1.92k forks source link

dla34 fps doubt? #210

Open windyrobin opened 5 years ago

windyrobin commented 5 years ago

DLA-34 | 37.4 / 52 | 39.2 / 28 | 41.7 / 4

the paper says use DLA-34 in resulotion 512x512 , could achieve 52fps, (Titian XP) this means <20ms for one frame,

but I test the origin dla34up in pytorch it costs about 30+ ms, (GP100) and paper says CenterNet also add deformable component to the origin, and it should add some extra cost...

i think there is no big difference in xp and gp100...

so , how 52-fps comes from ?

import time 

import numpy as np
import shutil

import sys                                                                                            
from PIL import Image
import torch
import torch.utils.data
from torch import nn
import dla_up
import dla
import torchvision.models as models

resnet18 = models.resnet18()
resnet34 = models.resnet34() 
resnet50 = models.resnet50()
#alexnet = models.alexnet()
#vgg16 = models.vgg16()
#squeezenet = models.squeezenet1_0()
densenet = models.densenet201()                                                                       
#inception = models.inception_v3()
#mobilenet = models.mobilenet_v2()                                                                    
#resnext50_32x4d = models.resnext50_32x4d()

def main():                         
    model = dla_up.__dict__.get('dla34up')(24, None, down_ratio=4)                                    
    #model = dla.__dict__.get('dla34')()                                                              

    #model = densenet                                                                                 
    #model = resnet34
    x = torch.tensor (torch.ones (1, 3, 512, 512))                                                    

    x = x.cuda()                                                                                      
    model.cuda() 

    out1, out2 = model (x)
    #y = model (x)

    start = time.time()
    for i in range(100):
        out1,out2 = model(x)                                                                          
        #y = model(x) 
    end = time.time()
    print('total cost:', end - start)
    # print (model.arch_parameters ())                                                                
    #print ('y: ', y.size())
    print ('out1: ', out1.size())                                                                     
    print ('out2: ', out2.size())                                                                     

main() 

result:

total cost: 3.330667734146118
out1:  torch.Size([1, 24, 512, 512])
out2:  torch.Size([1, 24, 128, 128])
xingyizhou commented 5 years ago

Thanks for your report. However the dla_up in your script is not from our codebase. I could not understand your 24 channels. When you are running test using src/test.py folllowing readme, our script will print the detailed time in each stage (e.g., network feedforward, decoding). You can start debugging from here. I don't have a GP100 but I can basically reproduce similar testing performance (~3ms slower) on a GTX 1080Ti.

windyrobin commented 5 years ago

Thanks for your report. However the dla_up in your script is not from our codebase. I could not understand your 24 channels. When you are running test using src/test.py folllowing readme, our script will print the detailed time in each stage (e.g., network feedforward, decoding). You can start debugging from here. I don't have a GP100 but I can basically reproduce similar testing performance (~3ms slower) on a GTX 1080Ti.

tks ,i'll check it

btw, tks for your great work

palealice commented 2 years ago

Does fps in the article only calculate the network inference time, or does it include pre-processing and post-processing?

palealice commented 2 years ago

image I test 512*512 resolution and get this result(2080ti)