Open fenguan opened 1 year ago
It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the --conf-thres
value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.
By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application.
It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.
--conf-thres
Your suggestion works great. I ordered --conf-thres=0.05. The effect is as follows:
Are there other ways to improve it?
By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application
In this project, I set --conf-thres=0.05 --iou-thres=0.1. This is the result of direct inference But sometimes people cannot be detected well.
So want to judge whether a person is looking around in the crop.
Is it possible to combine your two efforts to achieve this? I would be very grateful if you have other suggestions.
It seems that your task is mostly related to multi-person body detection and orientation estimation. I simply run my JointBDOE (https://github.com/hnuzhy/JointBDOE) work in your demo images. The results of person orientation are basically reliable.
If you want to further judge whether a person is looking around or not, you may run single HPE task with full-range view. You may refer the method WHE-Net (https://github.com/Ascend-Research/HeadPoseEstimation-WHENet or https://github.com/PINTO0309/HeadPoseEstimation-WHENet-yolov4-onnx-openvino) which uses a well-cropped head image as its input.
Hello, I'm having some problems with inference. I can successfully infer using the photos you provided but not on my dataset. It's still the original photo.
In addition, I want to use this to judge where people are looking. Is it possible to do this? Here is the image I use.