rawalkhirodkar / egohumans

[ICCV, 2023] Multiple humans in 3D captured by dynamic and static cameras in 4K.
https://rawalkhirodkar.github.io/egohumans/
Apache License 2.0
31 stars 4 forks source link
egocentric-vision human-pose-estimation mesh multiview-learning structure-from-motion

EgoHumans: An Egocentric 3D Multi-Human Benchmark

ICCV 2023 (Oral) [Rawal Khirodkar](https://github.com/rawalkhirodkar)1, [Aayush Bansal](https://www.aayushbansal.xyz/)2, [Lingni Ma](https://scholar.google.nl/citations?user=eUAgpwkAAAAJ&hl=en/)2, [Richard Newcombe](https://scholar.google.co.uk/citations?user=MhowvPkAAAAJ&hl=en)2, [Minh Vo](https://minhpvo.github.io/)2, [Kris Kitani](https://kriskitani.github.io/)1 1[CMU](https://www.cmu.edu/), 2[Meta](https://about.meta.com/)

Project Page

We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indooronly scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild with annotations to support diverse tasks such as human detection, tracking, 2D/3D pose estimation, and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc. Furthermore, our multi-view setup generates accurate 3D ground truth even under severe or complete occlusion. The dataset consists of more than 125k egocentric images, spanning diverse scenes with a particular focus on challenging and unchoreographed multi-human activities and fast-moving egocentric views. We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 on the EgoHumans dataset

Overview

Fencing GIF

summary_tab

Get Started

Supported Benchmarks

Create your own Benchmark

BibTeX & Citation

@article{khirodkar2023egohumans,
  title={EgoHumans: An Egocentric 3D Multi-Human Benchmark},
  author={Khirodkar, Rawal and Bansal, Aayush and Ma, Lingni and Newcombe, Richard and Vo, Minh and Kitani, Kris},
  journal={arXiv preprint arXiv:2305.16487},
  year={2023}
}

Acknowledgement

Aria Toolkit, COLMAP, mmpose, mmhuman3D, CLIFF, timm, detectron2, mmcv, mmdet, mmtrack.

Contact