sacmehta / EdgeNets

This repository contains the source code of our work on designing efficient CNNs for computer vision
MIT License
412 stars 82 forks source link

question about cityscapes performance of ESPNet v2 #8

Closed Wangzhuoying0716 closed 5 years ago

Wangzhuoying0716 commented 5 years ago

Hi, I'd like to ask that if the performance on cityscapes reported in the model_zoo/README.md is the trained using both fine and coarse labeled data? I tried to run your code using only fine labeled data at s=1.5, 100 epochs for stage1 and 100 epochs for stage2 but the best mIoU on val is 61.1%, 2.7% lower than your implementation. I wonder if it's because I didn't train enough or the data problem.

sacmehta commented 5 years ago

We didn’t use coarse data.

How are you measuring mIOU for Cityscapes? Are you using official scripts?

Wangzhuoying0716 commented 5 years ago

No, I just use your measuring code during the training process.

sacmehta commented 5 years ago

You need to use official scripts for Cityscapes. Official scripts provided by Cityscapes team use a slightly different version of mIOU calculation than the standard version. Once you evaluate using official scripts, you will not see performance difference.

sacmehta commented 5 years ago

https://github.com/mcordts/cityscapesScripts

Wangzhuoying0716 commented 5 years ago

Ok. I will have a try. Thanks!

Wangzhuoying0716 commented 5 years ago

I have tried the official script and the result gets better. Thanks a lot! By the way, for the speed '83 fps' mentioned in the ESPNet v2 paper, what is your batch size for testing? Since I used one Titan Xp card with batch size 4 at image size of 1024x512, but only got 46.3 fps and while the batch size was set as 1, the result turned to 23.9fps. Here is my testing code.

num_gpus = torch.cuda.device_count()
device = 'cuda' if num_gpus > 0 else 'cpu'
model = model.to(device=device)
model.eval()
input = torch.rand(1, 3, 1024, 512).to(device)

output = model(input) # for initialization

starttime = time.time()
for i in range(100):
    output = model(input)
endtime = time.time()
speed = 100.0/(endtime - starttime)
print("speed:",speed,"fps")
sacmehta commented 5 years ago

PyTorch has some initialization time and therefore, you should discard the time for the first e while measuring time.

sacmehta commented 5 years ago

I have edited your code above

Wangzhuoying0716 commented 5 years ago

Thanks for your correction and the result did get better, from 23.9fps to 25.9fps, but still not good enough. And then I test on a machine with all the other GPU cards free and the result gets better. I got 30.4fps when batch size=1 and 49.8fps when batch size=4, both with s=2.0 and the result was averaged by 400 iterations(a little bit better than 100 iterations). I wonder if I still did something wrong? And strangely, the FLOPs for an input of size 224x224 is 258.33 million, parameters are 0.789242, not exactly same with the paper(322M, 725K).

sacmehta commented 5 years ago

You need to call synchronize. See below thread:

https://github.com/sacmehta/ESPNet/issues/57

In paper, we didn’t report FLOPs at 224x224 for segmentation. Also, please see updated paper at ARXIV or CVPR website

http://openaccess.thecvf.com/content_CVPR_2019/html/Mehta_ESPNetv2_A_Light-Weight_Power_Efficient_and_General_Purpose_Convolutional_Neural_CVPR_2019_paper.html

sacmehta commented 5 years ago

We don’t use older version anymore.

Wangzhuoying0716 commented 5 years ago

Thanks! Though I use the code referred in the sacmehta/ESPNet#57, I still can not reimplement the speed. Thanks for sharing and replying.

sacmehta commented 5 years ago

Could you please check if cuDNn is enabled?

sacmehta commented 5 years ago

We used TitanX GPU. Which gPU are you using? Speed will vary depending upon GPU. Also, make sure you run on single GPU.

Wangzhuoying0716 commented 5 years ago

I used Titan Xp GPU. Since the newest paper did not report the speed, could you share your speed (fps) result for image size 512x1024 and s=2.0?

sacmehta commented 5 years ago

Sure, I will do it tomorrow

Jason93K commented 5 years ago

I have a same question. I measured ESPNetv2 segmentation model with s=2.0 using the code referred in the sacmehta/ESPNet#57 and same environment, But I got only 20 fps speed on Titan X gpu. What is your speed measurement, @sacmehta ?

sacmehta commented 5 years ago

I am getting about 45 FPS on GeForce GTX 1080.

Important NOTE: I ran this experiment while my machine is running some other experiment, so speeds might vary from GPU to GPU and from idle machine to busy machine.

Below is the code that I used for running the experiment


import time
import torch

def _time(is_cuda):
    if is_cuda:
        torch.cuda.synchronize()

    return time.time()

def computeTime(model, inputs, device='cuda'):
    model.eval()
    with torch.no_grad():
        is_cuda = True if device == 'cuda' else False
        # let pytorch initialize it
        model(inputs)
        _time(is_cuda)

        time_fwd = 0
        iterations = 10

        for i in range(iterations):
            t1 = _time(is_cuda)
            model(inputs)
            t2 = _time(is_cuda)
            time_fwd = time_fwd + (t2 - t1)
            print(t2 - t1)
        mean_execution_time = (time_fwd*1000.0) / iterations
        print('Avg execution time (ms): {:.3f}'.format(mean_execution_time ))
        print('FPS: {:.3f}'.format(1000.0/mean_execution_time))

if __name__ == "__main__":
    import torch
    import argparse
    from model.segmentation.espnetv2 import espnetv2_seg

    parser = argparse.ArgumentParser(description='Testing')
    args = parser.parse_args()

    args.classes = 21
    args.s = 2.0
    args.weights=''
    args.dataset='pascal'

    inputs = torch.FloatTensor(1, 3, 512, 1024) #.fill_(0.01)

    model = espnetv2_seg(args)
    num_gpus = torch.cuda.device_count()

    device = 'cuda' if num_gpus >= 1 else 'cpu'
    if device == 'cuda':
        model = model.cuda()
        inputs = inputs.cuda()

        import torch.backends.cudnn as cudnn

        cudnn.benchmark = True
        cudnn.deterministic = True

    computeTime(model, inputs, device)```
sacmehta commented 5 years ago

Also, EESP modules are implemented sequentially in this version. If you want to measure the actual inference speed, you need to parallelize those modules, as noted in implementation note here:

https://github.com/sacmehta/ESPNetv2#implementation-note