How do I test inference on custom images?

jiahaoLjh / HumanDepth

Code for "HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization"

MIT License

28 stars 3 forks source link

How do I test inference on custom images? #2

Open deval-maker opened 4 years ago

deval-maker commented 4 years ago

Kudos for the great work!

I was trying to run the network for custom images, I could not find the 2d Joint coordinate estimation network with the released code. I might have missed something, could you please guide me to it?

Code to run the network separately on custom images with/without bounding box information would be really helpful.

Thanks.

jiahaoLjh commented 4 years ago

Hi @deval-maker

The main focus of the code is to estimate the root joint location. However, there is indeed a 2D pose estimate as a by-product output of the model at https://github.com/jiahaoLjh/HumanDepth/blob/fba1c6669d09418b1a4bd648a9f4021821ca4037/test.py#L99

It's possible to perform inference on custom images with proper changes to the data loader in data/dataset.py. To obtain visualizable results other than the root joint location alone, you may consider adopting other 3D pose estimation approaches to recover the full 3D pose as well (such as the one used in our paper, refer to https://github.com/mks0601/3DMPPE_POSENET_RELEASE).

deval-maker commented 4 years ago

Understood! Regarding the custom inference, I have a question

As I understood from the model file, https://github.com/jiahaoLjh/HumanDepth/blob/fba1c6669d09418b1a4bd648a9f4021821ca4037/model.py#L186 x, coord_map, bbox_masks, vis are the required inputs. And these are calculated in dataset.py. In the context of multi-person inference, will this https://github.com/jiahaoLjh/HumanDepth/blob/fba1c6669d09418b1a4bd648a9f4021821ca4037/data/dataset.py#L55 become center of the bounding box and corresponding width and height?

What else do I need to take care of while running a multi-person inference?

jiahaoLjh commented 4 years ago

(cx, cy) corresponds to the principal point of the camera, and (pw, ph) could be considered as the size of the "canvas" in the image plane. (pw, ph) should not be changed for pre-trained models as otherwise the depth estimates will be affected due to the scale change of the input.

The only difference between single-person and multi-person cases is the bounding box input at https://github.com/jiahaoLjh/HumanDepth/blob/fba1c6669d09418b1a4bd648a9f4021821ca4037/data/dataset.py#L77 You just need to give a mask indicating the region inside the bounding box of a target person. For single-person case, we simply ignore the bounding box by setting the mask to be 1 at all pixel locations.

YangJae96 commented 4 years ago

@deval-maker

How did you change the bbox_mask to inference on multi-person image?? I am having difficulty inputting the model when I have the bbox and the images... Could you give some help??

deval-maker commented 4 years ago

@deval-maker

How did you change the bbox_mask to inference on multi-person image?? I am having difficulty inputting the model when I have the bbox and the images... Could you give some help??

I couldn't make it work either.