open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.6k stars 1.22k forks source link

Using ONNX model for inference #949

Open soltkreig opened 2 years ago

soltkreig commented 2 years ago

Hi! Is it possible to use ONNX model to inference? Model have a heatmap output and I don't know how to integrate it for inference using mmpose post processing functions or insert to config file.

wardaddy24 commented 2 years ago

Any update? I ported the mmpose model to onnx using the pytorch2onnx.py . I generated the graph from netron.app. The input label is input.1 and needs an image (1,3,256,192). The output layer is labelled as 996. On running inference through the config : mmpose/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/mpii_trb/res101_mpii_trb_256x256.py checkpoint(ported to onnx) : mmpose/checkpoints/res101_mpii_trb_256x256-cfad2f05_20200812.pth I am getting an output ndarray of size (1, 40, 64, 48).

Can anyone help me with the post-processing of this <class 'numpy.ndarray'>.

lunzueta commented 2 years ago

I just wanted the same thing. In the code, you'll see this function: keypoints_from_heatmaps(). it is for that. I hope it helps...

ukcastle commented 2 years ago

Hi. I had the same problem I just tweaked the code and didn't optimize it, but it works.

I used this model

from mmpose.apis import init_pose_model
CONFIG = "configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192.py"
WEIGHT = "https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth"
model = init_pose_model(CONFIG, WEIGHT, "cpu")

Load Image

img = cv2.imread("Capture_24.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (192,256))
img = torchvision.transforms.ToTensor()(img)
img = torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])(img)
img = img.unsqueeze(0)
img = img.numpy()

Post Processing

import onnxruntime as ort
ort_sess = ort.InferenceSession("output.onnx")
output = ort_sess.run(None, {"input.1" : img})
heatmaps = output[0]

imageHegiht,imageWidth = img.shape[2:]
aspect_ratio = imageWidth / imageHegiht

boxX, boxY, boxW, boxH = 0, 0, 192, 256
center = np.array([boxX + boxW/2 , boxY + boxH/2])
if boxW > aspect_ratio * boxH:
  boxH = boxW * 1.0 / aspect_ratio
elif boxW < aspect_ratio * boxH:
  boxW = boxH * aspect_ratio

PIXEL_STD = 200
PADDING = 1
scale = np.array([boxW, boxH]) / PIXEL_STD * PADDING 

detectNum, keypointNum ,heatmapHeight, heatmapWidth = heatmaps.shape
reshapeHeatmap = np.reshape(heatmaps, (detectNum,keypointNum,-1))
heatmapMaxIdx = np.reshape(np.argmax(reshapeHeatmap, 2),(detectNum,keypointNum,1))
maxVals = np.reshape(np.amax(reshapeHeatmap, 2), (detectNum,keypointNum,1))
preds = np.tile(heatmapMaxIdx, (detectNum,1,2))
preds[:, :, 0] = preds[:, :, 0] % heatmapWidth
preds[:, :, 1] = preds[:, :, 1] // heatmapWidth
preds = np.where(np.tile(maxVals, (detectNum,1,2)) > 0.0, preds, -1)
preds = preds.astype(np.float32)

# add +/-0.25 shift to the predicted locations for higher acc.
for n in range(detectNum):
  for k in range(keypointNum):
    heatmap = heatmaps[n][k]
    px = int(preds[n][k][0])
    py = int(preds[n][k][1])
    if 1 < px < heatmapWidth-1 and 1 < py < heatmapHeight-1:
      diff = np.array([
        heatmap[py][px + 1] - heatmap[py][px -1],
        heatmap[py+1][px] - heatmap[py-1][px]
      ])
      preds[n][k] += (np.sign(diff) * np.array(0.25)) 
coordinate = preds[0]

scale = scale * PIXEL_STD
scaleX = scale[0] / heatmapWidth
scaleY = scale[1] / heatmapHeight
targetCoordinate = np.ones_like(coordinate)
targetCoordinate[:, 0] = coordinate[:, 0] * scaleX + center[0] - scale[0] * 0.5
targetCoordinate[:, 1] = coordinate[:, 1] * scaleX + center[1] - scale[1] * 0.5
refineOutput = np.expand_dims(np.concatenate((targetCoordinate, maxVals[0]),axis=1), 0)
Actmaiji commented 1 year ago

hello, i have some quetions about the post processing.1. what's the different between imageHegiht,imageWidth and boxH, and BoxW? 2.Is the refineOutput coordinate about original image(for eg 441441 ) or resized image(for eg 192256)?3.top down(lite-hrnet 18 192*256) processing just directly resize the image?or use affine transform? Thanks Sincerely for any answer

ukcastle commented 1 year ago

@Actmaiji Hello, sorry for the late reply because I didn't receive any notice.

When I first made a simple example, I extracted the Bbox roi (Using Human Detection Model) and made a 192x256 image. Using that as an example, the xywh format of Bbox is 0,0,192,256.

It was implemented by following the example in MMPose, so I was trying to find the overall image aspect ratio. But when I tried it with onnx, it was most neat to just extract the ROI with the detection model and put it as (1,3,192,256) input.

If it is a 441x441 image and the ROI of a person is (x1:50, y1:50, x2:100, y2:100), Extracting only the ROI (50x50 image) I converted it to letterbox format and used it (it didn't come out well if I just resize it)

It's implemented in Yolo v5, so I'll post a link to what I implemented separately.