Closed MrLinNing closed 5 years ago
Could you please ensure that CUDA and CUDNN installed properly? Also, are you discarding the first iteration?
Also, set the GPU frequency to maximum.
P.S.: we used PyTorch v0.3
@sacmehta Wow,thank you! It‘s GPU frequency!
Hi, do your meet this problem when test the ENet and ERFNet? I can not fix the bug,
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /pytorch/torch/lib/THCUNN/im2col.h:120
No, I didn’t encounter this issue.
PS: I encountered issues with bilinear interpolation on TX2, so you might want to use deconvolution for upsampling.
I trained ESPNetv 2 with my data set. I modified gen_cityscapes.py to work with my data (768 ✕ 432), and when I ran it, only about 5 FPS was out on Jetson TX 2. Running jetson_clocks.sh to maximize GPU frequency. In the paper, if the image size I used is more than 10 FPS, I do not know where the problem is. Help me!
Jetpack 3.1 python2.7 CUDA 8.0 cuDNN 6.0
Corrected source code
from __future__ import division
from __future__ import print_function
import numpy as np
import torch
import glob
import SegmentationModel as net
import time
import cv2
import os
from argparse import ArgumentParser
from torch import nn
pallete = [[153,153,153],
[170,234,150],
[220,220, 0],
[107,142, 35],
[152,251,152],
[ 70,130,180],
[220, 20, 60],
[ 0, 60,100],
[150,250,250],
[ 0, 0, 0],
[ 0, 0, 0]]
def relabel(img):
return img
def evaluateModel(args, model, image_list):
# gloabl mean and std values
mean = [131.84157, 145.38597, 135.16437]
std = [76.013596, 67.85283, 70.89791 ]
model.eval()
for i, imgName in enumerate(image_list):
img = cv2.imread(imgName)
if args.overlay:
img_orig = np.copy(img)
start = time.time()
img = img.astype(np.float32)
for j in range(3):
img[:, :, j] -= mean[j]
for j in range(3):
img[:, :, j] /= std[j]
img = cv2.resize(img, (args.inWidth, args.inHeight))
if args.overlay:
img_orig = cv2.resize(img_orig, (args.inWidth, args.inHeight))
img /= 255
img = img.transpose((2, 0, 1))
img_tensor = torch.from_numpy(img)
img_tensor = torch.unsqueeze(img_tensor, 0) # add a batch dimension
if args.gpu:
img_tensor = img_tensor.cuda()
img_out = model(img_tensor)
classMap_numpy = img_out[0].max(0)[1].byte().cpu().data.numpy()
# upsample the feature maps to the same size as the input image using Nearest neighbour interpolation
# upsample the feature map from 1024x512 to 2048x1024
#classMap_numpy = cv2.resize(classMap_numpy, (args.inWidth*2, args.inHeight*2), interpolation=cv2.INTER_NEAREST)
if i % 100 == 0 and i > 0:
print('Processed [{}/{}]'.format(i, len(image_list)))
elapsed_time = time.time() - start
print ('time')
print (elapsed_time)
name = imgName.split('/')[-1]
if args.colored:
classMap_numpy_color = np.zeros((img.shape[1], img.shape[2], img.shape[0]), dtype=np.uint8)
for idx in range(len(pallete)):
[r, g, b] = pallete[idx]
classMap_numpy_color[classMap_numpy == idx] = [b, g, r]
cv2.imwrite(args.savedir + os.sep + 'c_' + name.replace(args.img_extn, 'png'), classMap_numpy_color)
if args.overlay:
overlayed = cv2.addWeighted(img_orig, 0.5, classMap_numpy_color, 0.5, 0)
cv2.imwrite(args.savedir + os.sep + 'over_' + name.replace(args.img_extn, 'jpg'), overlayed)
if args.cityFormat:
classMap_numpy = relabel(classMap_numpy.astype(np.uint8))
cv2.imwrite(args.savedir + os.sep + name.replace(args.img_extn, 'png'), classMap_numpy)
def main(args):
# read all the images in the folder
image_list = glob.glob(args.data_dir + os.sep + '*.' + args.img_extn)
modelA = net.EESPNet_Seg(args.classes, s=args.s)
if not os.path.isfile(args.pretrained):
print('Pre-trained model file does not exist. Please check ./pretrained_models folder')
exit(-1)
modelA = nn.DataParallel(modelA)
modelA.load_state_dict(torch.load(args.pretrained))
if args.gpu:
modelA = modelA.cuda()
# set to evaluation mode
modelA.eval()
if not os.path.isdir(args.savedir):
os.mkdir(args.savedir)
evaluateModel(args, modelA, image_list)
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--model', default="ESPNetv2", help='Model name')
parser.add_argument('--data_dir', default="./izunuma", help='Data directory')
parser.add_argument('--img_extn', default="png", help='RGB Image format')
parser.add_argument('--inWidth', type=int, default=768, help='Width of RGB image')
parser.add_argument('--inHeight', type=int, default=432, help='Height of RGB image')
parser.add_argument('--savedir', default='./results', help='directory to save the results')
parser.add_argument('--gpu', default=True, type=bool, help='Run on CPU or GPU. If TRUE, then GPU.')
parser.add_argument('--pretrained', default='../models/izunuma_dataset9_0.5/model_best.pth', help='Pretrained weights directory.')
parser.add_argument('--s', default=0.5, type=float, help='scale')
parser.add_argument('--cityFormat', default=True, type=bool, help='If you want to convert to cityscape '
'original label ids')
parser.add_argument('--colored', default=True, type=bool, help='If you want to visualize the '
'segmentation masks in color')
parser.add_argument('--overlay', default=True, type=bool, help='If you want to visualize the '
'segmentation masks overlayed on top of RGB image')
parser.add_argument('--classes', default=11, type=int, help='Number of classes in the dataset. 20 for Cityscapes')
args = parser.parse_args()
if args.overlay:
args.colored = True # This has to be true if you want to overlay
main(args)
Are you accounting for image reading and writing time? If so, discard that.
Thank you! Reading images and writing images were discarded from the processing time, and the processing speed improved up to 8 FPS. However, it has not reached more than 10 FPS yet.
What have I missed else?
I think GPU frequencies are not set properly. Could you run the below command and then test the speed:
sudo nvpmodel -m 0
I have a lot of questions, sorry I tried sudo nvpmodel -m 0 but the processing speed still remained at 8.5 FPS.
I checked the status of GPU with tegrastats during processing
sudo ~/tegrastats RAM2969/7851MB(lfb880x4MB)cpu[1%@2028,72%@2034,26%@2036,1%@2024,2%@2026,2%@2027] EMC 12%@1866 APE 150 GR3D 47%@1300
The usage rate of GPU memory is about 50%, and it is not used up to the maximum. Does this indicate that the python code is defective?
PyTorch has an initialization time which is way too high. You need to discard the execution time of your first frame. If you are not doing so, please discard and check.
The other thing worth trying is to just pass a random tensor to the model and measure the time, similar to the script mentioned at the beginning of this thread.
I trie to just pass a random tensor to the model and measure the time. It was almost the same processing speed in random tensor and images. Perhaps it is only this part to measure?
start = time.time() img_out = model(img_tensor) classMap_numpy = img_out[0].max(0)[1].byte().cpu().data.numpy() elapsed_time = time.time() - start
If so, it will result in about 12 FPS.
You actually measure the GPU time, so you only need to measure the model execution time.
img_out = model(img_tensor)
Thank you! I misunderstood that we calculate not only the inference part but also a series of processing times.
Hello, @sacmehta I run the ESPNet on jetson TX2 and the JetPack SDK verson is 4.1.1, pytorch version is 4.0. I find that when the input image is 360x640, the inference time is about 0.112s which means FPS is less than 10. (I am sure without image loading and image writing time.) In your paper, the inference Speed is more than 16 when the image is 360x640. Can you give me more details about it? Beside, I use the erf_net code to measure the inference time of ESPNet, https://github.com/Eromera/erfnet_pytorch/blob/master/eval/eval_forwardTime.py