Open SabraHashemi opened 3 years ago
my output for this script : Time taken for Onnx model 0:00:00.250296
import time from datetime import datetime import torch import cv2 import onnxruntime as rt
import craft_utils import imgproc
sess = rt.InferenceSession("craft.onnx") input_name = sess.get_inputs()[0].name print( rt.get_device() ) first_output_name = sess.get_outputs()[0].name
print('\n') print('\n') print('input_name',input_name) print('output_name',first_output_name) print('\n') print('\n')
img = cv2.imread('./data/1.jpg') img_resized, target_ratio, size_heatmap = imgproc.resize_aspect_ratio(img, 1280, interpolation=cv2.INTER_LINEAR, mag_ratio=1.5) ratio_h = ratio_w = 1 / target_ratio
print(ratio_h, ratio_w)
x = imgproc.normalizeMeanVariance(img_resized) x = torch.from_numpy(x).permute(2, 0, 1) # [h, w, c] to [c, h, w] x = x.unsqueeze(0) # [c, h, w] to [b, c, h, w]
t1 = datetime.now() y, _ = sess.run(None, {input_name: x.numpy()}) t2 = datetime.now() print("Time taken for Onnx model", str(t2-t1))
score_text = y[0, :, :, 0] score_link = y[0, :, :, 1]
boxes,polys = craft_utils.getDetBoxes(score_text, score_link, 0.5, 0.3, 0.3,True)
boxes = craft_utils.adjustResultCoordinates(boxes, ratio_w, ratio_h) polys = craft_utils.adjustResultCoordinates(polys, ratio_w, ratio_h) print(boxes)
@sabrabano0, could you try the following: (1) Use a warm up query before measuring latency. That is, exclude the first call of sess.run(...). (2) After warming up, send N (like N=1000 of calls sess.run) and get statistics (like average) of latency. (3) Try IO Binding. You used API that need copy input tensors to GPU and copy output tensors to CPU. If you included those IO time, it might not be fair to compare with Torch (since input/output are in GPU for Torch) or CPU provider (since it does not need such IO).
@sabrabano0, could you try the following: (1) Use a warm up query before measuring latency. That is, exclude the first call of sess.run(...). (2) After warming up, send N (like N=1000 of calls sess.run) and get statistics (like average) of latency. (3) Try IO Binding. You used API that need copy input tensors to GPU and copy output tensors to CPU. If you included those IO time, it might not be fair to compare with Torch (since input/output are in GPU for Torch) or CPU provider (since it does not need such IO).
1st suggestion should fix issue.
onnxruntime 1.2.0
Model loaded Time taken for Pytoch model 0:00:00.975350 Output size torch.Size([1, 112, 112, 2]) Model ran sucesfully Model converted succesfully Model checked succesfully CPU Time taken for Onnx model 0:00:00.178522
with onnxruntime gpu 1.2.0
Model loaded Time taken for Pytoch model 0:00:00.755978 Output size torch.Size([1, 112, 112, 2]) Model ran sucesfully Model converted succesfully Model checked succesfully GPU Time taken for Onnx model 0:00:00.617351
i read all other similar ithreds but the problem not solved or clear