Yana Hasson, Gül Varol, Ivan Laptev and Cordelia Schmid
Note that you will need a reasonably recent GPU to run this code.
We recommend using a conda environment:
conda env create -f environment.yml
conda activate phosa16
mkdir -p external
git clone --branch v0.2.1 https://github.com/facebookresearch/detectron2.git external/detectron2
pip install external/detectron2
Install a slightly modified fast version of Neural Mesh Renderer (NMR) (modification makes this implementation of NMR pytorch 1.6 compatible :) )
mkdir -p external
git clone https://github.com/hassony2/multiperson.git external/multiperson
pip install external/multiperson/neural_renderer
cd external/multiperson/sdf
pip install external/multiperson/sdf
Install FrankMocap, with a slight twist to return the detected objects from Understanding Human Hands in Contact at Internet Scale, Shan et al., CVPR 2020.
mkdir -p external
git clone https://github.com/hassony2/frankmocap.git external/frankmocap
sh scripts/install_frankmocap.sh
Download the dataset following the instructions on the official project webpage.
This code expects to find the ho3d root folder at
local_data/datasets/ho3d
Make sure your file structure after completing all the Setup steps, your file structure in the homan folder looks like this.
# Installed datasets
local_data/
datasets/
ho3d/
core50/
ShapeNetCore.v2/
epic/
# Auxiliary data needed to run the code
extra_data/
# MANO data files
mano/
MANO_RIGHT.pkl
...
smpl/
SMPLX_NEUTRAL.pkl
python fit_vid_dataset.py --dataset core50 --optimize_object_scale 0 --result_root results/core50/step1
python fit_vid_dataset.py --dataset core50 --split test --lw_collision 0.001 --lw_contact 1 --optimize_object_scale 0 --result_root results/core50/step2 --resume results/core50/step1
python fit_vid_dataset.py --dataset ho3d --split test --optimize_object_scale 0 --result_root results/ho3d/step1
python fit_vid_dataset.py --dataset ho3d --split test --lw_collision 0.001 --lw_contact 1 --optimize_object_scale 0 --result_root results/ho3d/step2 --resume results/ho3d/step1
The code for this project is heavily based on and influenced by Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild (PHOSA)] by Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, and Angjoo Kanazawa, ECCV 2020
Consider citing their work !
@InProceedings{zhang2020phosa,
title = {Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild},
author = {Zhang, Jason Y. and Pepose, Sam and Joo, Hanbyul and Ramanan, Deva and Malik, Jitendra and Kanazawa, Angjoo},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020},
}
This work was funded in part by the MSR-Inria joint lab, the French government under management of Agence Nationale de la Recherche as part of the ”Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute) and by Louis Vuitton ENS Chair on Artificial Intelligence.
If you find this work interesting, you will certainly be also interested in the following publication:
To keep track of recent publications take a look at awesome-hand-pose-estimation by Xinghao Chen.
Note that our code depends on other libraries, including SMPL, SMPL-X, MANO which each have their own respective licenses that must also be followed.