is there any single hand pose esmation model support?

open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.

https://mmpose.readthedocs.io/en/latest/

Apache License 2.0

5.59k stars 1.21k forks source link

is there any single hand pose esmation model support? #191

Closed lucasjinreal closed 3 years ago

lucasjinreal commented 3 years ago

is there any single hand pose esmation model supports?

jin-s13 commented 3 years ago

Please check this https://github.com/open-mmlab/mmpose/blob/master/configs/top_down/resnet/README.md

lucasjinreal commented 3 years ago

@jin-s13 Thanks I saw that it's all predict on small images, I assume it's only a tiny image contains one-hand. Does the hand keypoint detection models on wild images? I mean normal images rather than croped one.

jin-s13 commented 3 years ago

Yes, currently we only support hand keypoint models with cropped hand images as the input.

To detect hand keypoints on original wild images, one may first detect hand boxes using mmdet, crop the hand regions and run mmpose. (Top-down)

Another possible solution is to run body pose estimation first, and use wrist&elbow keypoints to approximate a hand bounding box.

We plan to provide a mmdet hand model in future. It would be much appreciated if you could contribute.

lucasjinreal commented 3 years ago

@jin-s13 Thanks. I think if top-down model can trained end2end is also OK, same fashion as MediaPipe Blaze Hand keypoint. Do u know what's the SOTA in terms of this way?

jin-s13 commented 3 years ago

I have checked MediaPipe Blaze Hand Keypoint. It seems that it still requires a Palm Detection Model (to detect the hand bouding boxes), and it focuses on detecting 3d hand keypoints.

What is your application scenario? What does the input image look like? [1] multiple people in the wild? or [2] one single person? or [3] one hand with an arm or [4] one hand only.

If I understand correctly, MediaPipe only supports [3][4], instead of [1][2].

lucasjinreal commented 3 years ago

@jin-s13 I think it's hand in wild, many multiple person presents in same image. I suppose they are acutally 2 kinds? one is single palm, the other is multiple palms?

It seems coco-whole-body provides such a possibility to do this, but it should (maybe) hard to do very fast. 2 stages also appliciable unless they got a fast speed when two model concats togather.

In terms of this, the main issue may be single palm detection dataset in wild since it need detect both hands and cropped hands's keypoints. Do u know any proper dataset in this scenarios with both annotation of box of hand and keypoints?

jin-s13 commented 3 years ago

Aha yes, the dataset is the main problem. I did not find many datasets.

COCO-WholeBody dataset provides both hand bounding boxes & keypoints; OneHand10K also have such annotations, but the size is limited. It only contains 10k images.

lucasjinreal commented 3 years ago

@jin-s13 thanks, I have merge a huge 30k images for hand detection, this part can using for detect hands, Do u know which fast hand pose model can I use if I already have the hand detection?

jin-s13 commented 3 years ago

MMPose already supports fast models such as (mobilenetv2, shufflenet) as the backbone, you can train them on onehand10k dataset to get efficient hand keypoint models.

Currently MMPose only provides pre-trained weights for resnet models, others will be released soon.

lucasjinreal commented 3 years ago

@jin-s13 You mean training process does included? Which method does it using, you are mention backbone only

innerlee commented 3 years ago

@jinfagang all our supported models are trainable, and all configs in modelzoo can be used to reproduce the released checkpoints.

ref: https://mmpose.readthedocs.io/en/latest/top_down_models.html

lucasjinreal commented 3 years ago

@innerlee what's the fastest hand keypoint detection model for now?

innerlee commented 3 years ago

Well, this question belongs to repo mmdet. You may seek for help there.

The guideline is, one-stage is faster than two-stage, small backbone is faster than large backbone, fp16 might be faster.

lucasjinreal commented 3 years ago

@innerlee thanks, I mean keypoint detection part.

jin-s13 commented 3 years ago

Try this configs/top_down/mobilenet_v2/coco/mobilenetv2_coco_256x192.py, simple baseline with mobilenetv2 backbone. Just replace the coco dataset with onehand10k or others.

lucasjinreal commented 3 years ago

@jin-s13 thanks! what's this method using? BTW, I requested onehand10k dataset for a long time but didn't get proved, does there any copy of google drive or baidu disk?

jin-s13 commented 3 years ago

The method is based on B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. In ECCV, 2018 with mobilenetv2 as its backbone.

Sorry, we do not have the right to share the onehand10k dataset with you. According to the licence of onehand10k, it is not allowed to distribute the dataset. Please request the dataset from onehand10k.