karfly / learnable-triangulation-pytorch

This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!
MIT License
1.1k stars 181 forks source link

Using model on single-view videos (or webcam input) #78

Closed SynStratos closed 4 years ago

SynStratos commented 4 years ago

Hi, my team(@giuliorav @LeoDeep) and I would like to apply your human pose estimation on simple videos taken from a single camera (for which we don't have any projection matrix information). We are mainly using video downloaded from youtube by far, and we are planning to apply the same model to live input from a webcam. Is it possible to do so? How would you suggest to proceed? We tried both algebric and volumetric models. The first one goes wrong during the triangolation method, while extracting the svd from the matrix. The second one expects an input with the same structure of the Human3.6 dataset.

Thank you for any possible hint ;)

karfly commented 4 years ago

Hi, @SynStratos. Our methods are mainly supposed to run on multiple cameras. In the paper we showed, that our methods can be successfully used for single-camera 3D human pose estimation, but anyway you need to know the approximate absolute location of the human and intrinsics of the camera, which are difficult to estimate in the case of YouTube videos.

So, taking these complexities into account, I'd recommend choosing some other method, which was developed specially for single-camera 3D human pose estimation, and after you make the baseline work properly come back here and try our models. E.g. this paper proposes a nice and simple baseline: https://arxiv.org/abs/1804.06208 (note: this is not the state-of-the-art method at the moment).

SynStratos commented 4 years ago

Thanks for the suggestion @karfly !. Btw how we should proceed in case we get the projection matrix for our samples?