thuhcsi/MagicMan - Githubissues

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Xu He¹ · Xiaoyu Li² · Di Kang² · Jiangnan Ye¹ · Chaopeng Zhang² · Liyang Chen¹ ·
Xiangjun Gao³ · Han Zhang⁴ · Zhiyong Wu^1,5 · Haolin Zhuang¹ ·
¹Tsinghua University     ²Tencent AI Lab     ³The Hong Kong University of Science and Technology
⁴Standford University     ⁵The Chinese University of Hong Kong

Figure 1. Given a reference human image in different poses, outfits, or styles (i.e. real and fictional characters) as input, MagicMan is able to generate consistent high-quality novel view images and normal maps, which are well-suited for downstream multi-view reconstruction applications.

This repository will contain the official implementation of MagicMan. For more visual results, please checkout our project page.

📣 News & TODOs

[x] [2024.09.16] Release inference code and pretrained weights!
[x] [2024.08.27] Release paper and project page!
[ ] Release reconstruction code.
[ ] Release training code.

🧰 Models

Model	Resolution	#Views	GPU Memery (w/ refinement)	#Training Scans	Datasets
magicman_base	512x512	20	23.5GB	~2500	THuman2.1
magicman_plus	512x512	24	26.5GB	~5500	THuman2.1, CustomHumans, 2K2K, CityuHuman

Currently, we provide two versions of models: a base model trained on ~2500 scans to generate 20 views and an enhanced model trained on ~5500 scans to generate 24 views. Models can be downloaded here. Both pretrained_weights and one version of magicman_{version} are needed to be downloaded and put under ./ckpt as:

|--- ckpt/
|    |--- pretrained_weights/
|    |--- magic_base/ or magic_plus/

⚙️ Setup

1. Clone MagicMan

git clone https://github.com/thuhcsi/MagicMan.git
cd MagicMan

2. Installation

# Create conda environment
conda create -n magicman python=3.10
conda activate magicman

# Install PyTorch and other dependencies
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

# Install PyTorch3D
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

# Install mmcv-full
pip install "mmcv-full>=1.3.17,<1.6.0" -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.1/index.html

# Install mmhuman3d
pip install "git+https://github.com/open-mmlab/mmhuman3d.git"

3. Download required models and extra data

Click Register now on all dependencies, then you can download them all using ONE account with:

cd ./thirdparties/econ
bash fetch_data.sh

Requied models and extra data are from SMPL-X, PIXIE, PyMAF-X, and ECON.

Please consider cite these awesome works if they also help on your project.

```bibtex @inproceedings{SMPL-X:2019, title = {Expressive Body Capture: 3D Hands, Face, and Body from a Single Image}, author = {Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J.}, booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, year = {2019} } @inproceedings{PIXIE:2021, title={Collaborative Regression of Expressive Bodies using Moderation}, author={Yao Feng and Vasileios Choutas and Timo Bolkart and Dimitrios Tzionas and Michael J. Black}, booktitle={International Conference on 3D Vision (3DV)}, year={2021} } @article{pymafx2023, title={PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images}, author={Zhang, Hongwen and Tian, Yating and Zhang, Yuxiang and Li, Mengcheng and An, Liang and Sun, Zhenan and Liu, Yebin}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2023} } @inproceedings{pymaf2021, title={PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop}, author={Zhang, Hongwen and Tian, Yating and Zhou, Xinchi and Ouyang, Wanli and Liu, Yebin and Wang, Limin and Sun, Zhenan}, booktitle={Proceedings of the IEEE International Conference on Computer Vision}, year={2021} } @inproceedings{xiu2023econ, title = {{ECON: Explicit Clothed humans Optimized via Normal integration}}, author = {Xiu, Yuliang and Yang, Jinlong and Cao, Xu and Tzionas, Dimitrios and Black, Michael J.}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, } ```

💫 Inference

python inference.py --config configs/inference/inference-{version}.yaml --input_path {input_image_path} --output_path {output_dir_path} --seed 42 --device cuda:0

# e.g.,
python inference.py --config configs/inference/inference-base.yaml --input_path examples/001.jpg --output_path examples/001 --seed 42 --device cuda:0

🙏 Acknowledgments

Our code follows several excellent repositories. We appreciate them for making their codes available to the public.

Moore-AnimateAnyone.
ECON.
HumanGaussian. Thanks to the authors of HumanGaussian for additional advice and help!

✏️ Citing

If you find our work useful, please consider citing:

@misc{he2024magicman,
    title={MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement},
    author={Xu He and Xiaoyu Li and Di Kang and Jiangnan Ye and Chaopeng Zhang and Liyang Chen and Xiangjun Gao and Han Zhang and Zhiyong Wu and Haolin Zhuang},
    year={2024},
    eprint={2408.14211},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

📢 Disclaimer

⚠️This is an open-source research exploration rather than a commercial product, so it may not meet all your expectations. Due to the variability of the diffusion model, you may encounter failure cases. Try using different seeds and adjusting the denoising steps if the results are not desirable. Users are free to create novel views using this tool, but they must comply with local laws and use it responsibly. The developers do not assume any responsibility for potential misuse by users.