Closed Wangzhuoying0716 closed 5 years ago
We didn’t use coarse data.
How are you measuring mIOU for Cityscapes? Are you using official scripts?
No, I just use your measuring code during the training process.
You need to use official scripts for Cityscapes. Official scripts provided by Cityscapes team use a slightly different version of mIOU calculation than the standard version. Once you evaluate using official scripts, you will not see performance difference.
Ok. I will have a try. Thanks!
I have tried the official script and the result gets better. Thanks a lot! By the way, for the speed '83 fps' mentioned in the ESPNet v2 paper, what is your batch size for testing? Since I used one Titan Xp card with batch size 4 at image size of 1024x512, but only got 46.3 fps and while the batch size was set as 1, the result turned to 23.9fps. Here is my testing code.
num_gpus = torch.cuda.device_count()
device = 'cuda' if num_gpus > 0 else 'cpu'
model = model.to(device=device)
model.eval()
input = torch.rand(1, 3, 1024, 512).to(device)
output = model(input) # for initialization
starttime = time.time()
for i in range(100):
output = model(input)
endtime = time.time()
speed = 100.0/(endtime - starttime)
print("speed:",speed,"fps")
PyTorch has some initialization time and therefore, you should discard the time for the first e while measuring time.
I have edited your code above
Thanks for your correction and the result did get better, from 23.9fps to 25.9fps, but still not good enough. And then I test on a machine with all the other GPU cards free and the result gets better. I got 30.4fps when batch size=1 and 49.8fps when batch size=4, both with s=2.0 and the result was averaged by 400 iterations(a little bit better than 100 iterations). I wonder if I still did something wrong? And strangely, the FLOPs for an input of size 224x224 is 258.33 million, parameters are 0.789242, not exactly same with the paper(322M, 725K).
You need to call synchronize. See below thread:
https://github.com/sacmehta/ESPNet/issues/57
In paper, we didn’t report FLOPs at 224x224 for segmentation. Also, please see updated paper at ARXIV or CVPR website
We don’t use older version anymore.
Thanks! Though I use the code referred in the sacmehta/ESPNet#57, I still can not reimplement the speed. Thanks for sharing and replying.
Could you please check if cuDNn is enabled?
We used TitanX GPU. Which gPU are you using? Speed will vary depending upon GPU. Also, make sure you run on single GPU.
I used Titan Xp GPU. Since the newest paper did not report the speed, could you share your speed (fps) result for image size 512x1024 and s=2.0?
Sure, I will do it tomorrow
I have a same question. I measured ESPNetv2 segmentation model with s=2.0 using the code referred in the sacmehta/ESPNet#57 and same environment, But I got only 20 fps speed on Titan X gpu. What is your speed measurement, @sacmehta ?
I am getting about 45 FPS on GeForce GTX 1080.
Important NOTE: I ran this experiment while my machine is running some other experiment, so speeds might vary from GPU to GPU and from idle machine to busy machine.
Below is the code that I used for running the experiment
import time
import torch
def _time(is_cuda):
if is_cuda:
torch.cuda.synchronize()
return time.time()
def computeTime(model, inputs, device='cuda'):
model.eval()
with torch.no_grad():
is_cuda = True if device == 'cuda' else False
# let pytorch initialize it
model(inputs)
_time(is_cuda)
time_fwd = 0
iterations = 10
for i in range(iterations):
t1 = _time(is_cuda)
model(inputs)
t2 = _time(is_cuda)
time_fwd = time_fwd + (t2 - t1)
print(t2 - t1)
mean_execution_time = (time_fwd*1000.0) / iterations
print('Avg execution time (ms): {:.3f}'.format(mean_execution_time ))
print('FPS: {:.3f}'.format(1000.0/mean_execution_time))
if __name__ == "__main__":
import torch
import argparse
from model.segmentation.espnetv2 import espnetv2_seg
parser = argparse.ArgumentParser(description='Testing')
args = parser.parse_args()
args.classes = 21
args.s = 2.0
args.weights=''
args.dataset='pascal'
inputs = torch.FloatTensor(1, 3, 512, 1024) #.fill_(0.01)
model = espnetv2_seg(args)
num_gpus = torch.cuda.device_count()
device = 'cuda' if num_gpus >= 1 else 'cpu'
if device == 'cuda':
model = model.cuda()
inputs = inputs.cuda()
import torch.backends.cudnn as cudnn
cudnn.benchmark = True
cudnn.deterministic = True
computeTime(model, inputs, device)```
Also, EESP modules are implemented sequentially in this version. If you want to measure the actual inference speed, you need to parallelize those modules, as noted in implementation note here:
Hi, I'd like to ask that if the performance on cityscapes reported in the model_zoo/README.md is the trained using both fine and coarse labeled data? I tried to run your code using only fine labeled data at s=1.5, 100 epochs for stage1 and 100 epochs for stage2 but the best mIoU on val is 61.1%, 2.7% lower than your implementation. I wonder if it's because I didn't train enough or the data problem.