hnuzhy / DirectMHP

Codes for my paper "DirectMHP: Direct 2D Multi-Person Head Pose Estimation with Full-range Angles"
GNU General Public License v3.0
88 stars 8 forks source link

About inference #8

Open fenguan opened 1 year ago

fenguan commented 1 year ago

Hello, I'm having some problems with inference. I can successfully infer using the photos you provided but not on my dataset. It's still the original photo.

In addition, I want to use this to judge where people are looking. Is it possible to do this? Here is the image I use. thirdView000211

hnuzhy commented 1 year ago

It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the --conf-thres value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.

hnuzhy commented 1 year ago

By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application.

fenguan commented 1 year ago

It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.--conf-thres

Your suggestion works great. I ordered --conf-thres=0.05. The effect is as follows: 1

Are there other ways to improve it?

fenguan commented 1 year ago

By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application

In this project, I set --conf-thres=0.05 --iou-thres=0.1. This is the result of direct inference 2 3 4 But sometimes people cannot be detected well.

So want to judge whether a person is looking around in the crop. Snipaste_2023-03-25_22-17-27

Is it possible to combine your two efforts to achieve this? I would be very grateful if you have other suggestions.

hnuzhy commented 1 year ago

It seems that your task is mostly related to multi-person body detection and orientation estimation. I simply run my JointBDOE (https://github.com/hnuzhy/JointBDOE) work in your demo images. The results of person orientation are basically reliable.

224728890-c3b084e0-2c7e-4db8-a4a0-37bdc23cc5cd_res

227721824-1479139f-367a-4983-98fc-6c07d4d86efe_res

227722436-3224a597-f670-4206-af06-1b80a489e5fd_res

227722441-5c091bb7-53ff-4ea6-ba54-708e4229c023_res

227722862-134d2dcc-b615-47c8-ac50-df231ec14688_res

hnuzhy commented 1 year ago

If you want to further judge whether a person is looking around or not, you may run single HPE task with full-range view. You may refer the method WHE-Net (https://github.com/Ascend-Research/HeadPoseEstimation-WHENet or https://github.com/PINTO0309/HeadPoseEstimation-WHENet-yolov4-onnx-openvino) which uses a well-cropped head image as its input.