tucan9389 / PoseEstimation-CoreML

The example project of inferencing Pose Estimation using Core ML
https://github.com/motlabs/awesome-ml-demos-with-ios
MIT License
676 stars 135 forks source link

Support 3D pose estimation #35

Open tucan9389 opened 3 years ago

tucan9389 commented 3 years ago

lightweight-human-pose-estimation-3d-demo.pytorch

repo: https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch

Models

Metadata

image

Parsing Reference

sebo361 commented 3 years ago

Hi @tucan9389 how about using ARKit's 3D motion tracking: https://developer.apple.com/documentation/arkit/capturing_body_motion_in_3d ? Or do you think your proposed 3D pose estimation is more accurate?

tucan9389 commented 3 years ago

@sebo361 Thanks for your suggestion. I agree with that If the case prefers more. But I think there is some other side of benefits when using Core ML rather than AR Kit.

  1. When you want to inference person's 3D keypoints with a single image
  2. When you want to inference not only a person's but also an object's 3D keypoints which was trained by yourself
sebo361 commented 3 years ago

@tucan9389 True, these are interesting benefits to find out! However I am more interested on comparing both approaches regarding the inference speed and accuracy of 3D human pose keypoints. Now as the latest iPad Pro (and iPhone 12) have the LIDAR sensor, i expect to have more precision in 3D human motion tracking, but I am note sure if ARKit's 3D motion tracking uses LIDAR Depth API by default - do you have any insights of that?

Maybe I should setup a project to compare ARKit's 3D motion tracking with your proposed CoreML 3D pose estimation.

jookovjook commented 3 years ago

@sebo361 Also, Apple's 3D body tracking ARBodyTrackingConfiguration can't work simultaneously with the world tracking ARWorldTrackingConfiguration. So, unfortunately, you can't use Apple's 3D body tracking if you also need world tracking

tucan9389 commented 3 years ago

https://github.com/tucan9389/PoseEstimation-TFLiteSwift

Here is 3D pose estimation demo by using TFLiteSwift. I implemented softargmax with pure Swift and Accelerate framework. In Numpy or Pytorch(or TF), there support dimension summation to use easily, but in swift there is no softargmax function in multi-dimension matrix(tensor). So the implementation of softargmax is a little bit hard.