magic-research / GETAvatar

[ICCV 2023] GETAvatar: Generative Textured Meshes for Animatable Human Avatars
https://getavatar.github.io/
MIT License
97 stars 8 forks source link

๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅGETAvatar: Generative Textured Meshes for Animatable Human Avatars (ICCV 2023)๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ
Official PyTorch implementation

GETAvatar: Generative Textured Meshes for Animatable Human Avatars
Xuanmeng Zhang*, Jianfeng Zhang*, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng
Paper, Project Page

Abstract: We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries. Generally, two challenges remain in this field: i) existing methods struggle to generate geometries with rich realistic details such as the wrinkles of garments; ii) they typically utilize volumetric radiance fields and neural renderers in the synthesis process, making high-resolution rendering non-trivial. To overcome these problems, we propose GETAvatar, a Generative model that directly generates Explicit Textured 3D meshes for animatable human Avatar, with photo-realistic appearance and fine geometric details. Specifically, we first design an articulated 3D human representation with explicit surface modeling, and enrich the generated humans with realistic surface details by learning from the 2D normal maps of 3D scan data. Second, with the explicit mesh representation, we can use a rasterization-based renderer to perform surface rendering, allowing us to achieve high-resolution image generation efficiently. Extensive experiments demonstrate that GETAvatar achieves state-of-the-art performance on 3D-aware human generation both in appearance and geometry quality. Notably, GETAvatar can generate images at 512x512 resolution with 17FPS and 1024x1024 resolution with 14FPS, improving upon previous methods by 2x.

๐Ÿ“ข News

โš’๏ธ Requirements

๐Ÿƒโ€โ™‚๏ธ Getting Started

Clone the gitlab code and necessary files:

git clone https://github.com/magic-research/GETAvatar.git
cd GETAvatar; mkdir cache; cd cache
wget https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/metrics/inception-2015-12-05.pkl

SMPL models

Download the SMPL human models from this (male, female and neutral models) and the mixamo motion sequences from here.

Place them as following:

GETAvatar
|----smplx
    |----mocap
      |----mixamo
          |----0007  
          |----...
          |----0145  
    |----models
      |----smpl
          |----SMPL_FEMALE.pkl
          |----SMPL_MALE.pkl
          |----SMPL_NEUTRAL.pkl
|----...

๐Ÿ“ Preparing datasets

We train GETAvatar on 3D human scan datasets (THuman2.0 and RenderPeople). Here use THuman2.0 as an example because it's free. The same pipeline works also for the commericial dataset RenderPeople.

First, download THuman2.0 dataset and download the fitted SMPL results.

Place them as following:

GETAvatar
|----datasets
    |----THuman2.0
        |----THuman2.0_Release
            |----0000
                |----0000.obj
                |----material0.jpeg
                |----material0.mtl
            |----...
            |----0525
        |----THuman2.0_smpl
            |----0000_smpl.pkl
            |----...
            |----0525_smpl.pkl

First, run the pre-processing script prepare_thuman_scans_smpl.py to align the human scans:

python3 prepare_thuman_scans_smpl.py --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --tot to be the number of total instances and --id to be the rank of current instance.

Second, render the RGB image with blender:

blender --background test.blend --python render_aligned_thuman.py -- \
--device_id 0 --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --device_id to be the device ID, --tot to be the number of total instances and --id to be the rank of current instance.

Next, generate the camera pose and SMPL labels:

python3 prepare_thuman_json.py
python3 prepare_ext_smpl_json.py

Finally, render the normal images with pytorch3d:

python3 render_thuman_normal_map.py --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --tot to be the number of total instances and --id to be the rank of current instance.

The final structure of training dataset is as following:

GETAvatar
|----datasets
  |----THuman2.0_res512
      |----0000
          |----0000.png
          |----0001.png   
          |---- ...              
          |----0099.png  
          |----mesh.obj
          |----blender_transforms.json
      |----0001     
          |----...  
      |----0525   
          |----...
      |----aligned_camera_pose_smpl.json
      |----extrinsics_smpl.json
|----...

๐Ÿ™‰ Inference

Download pretrained model from here and save into ./pretrained_model.

You can generate the multi-view visualization with gen_multi_view_3d.py. For example:

python3 gen_multi_view_3d.py --data=datasets/THuman2.0/THuman2.0_res512  --gpus=1 --batch=4 --batch-gpu=4 --mbstd-group=4 --gamma=10 --dmtet_scale=2 --one_3d_generator=1  --fp32=0  --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False --blur_normal_image=False --camera_type=blender --load_normal_map=True --with_sr=True --seeds=0-3 --grid=2x2 --save_gif=False --render_all_pose=False --resume_pretrain=pretrained_model/THuman_512.pt  --output=output_videos/thu_512.mp4  --outdir=debug

You can specify --img_res to be the image resolution and --resume_pretrained to be the path of checkpoints.

You can generate the animation with gen_animation_view_3d.py. For example:

python3 gen_animation_3d.py --data=datasets/THuman2.0/THuman2.0_res512   --gpus=1 --batch=4 --batch-gpu=4 --mbstd-group=4 --gamma=20 --dmtet_scale=2 --one_3d_generator=1  --fp32=0  --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False  --blur_normal_image=False --camera_type=blender --load_normal_map=True  --with_sr=True --seeds=0-3 --grid=2x2 --save_gif=False --render_all_pose=False --action_type=0145 --frame_skip=1 --resume_pretrain=pretrained_model/THuman_512.pt --output=output_videos/thuman_mocap_0145.mp4 --outdir=debug

You can specify the image resolution with --img_res, the path of checkpoints with --resume_pretrained, the type of the motion sequence with --action_type.

๐Ÿ™€ Train the model

You can train new models using train_3d.py. For example:

python3 train_3d.py  --data=datasets/THuman2.0/THuman2.0_res512  --gpus=8 --batch=32 --batch-gpu=4 --mbstd-group=4 --gamma=10 --dmtet_scale=2 --one_3d_generator=1  --fp32=0 --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False --blur_normal_image=False --camera_type=blender --load_normal_map=True --with_sr=True --outdir=thuman_res512_ckpts

For distributed training, run the script dist_train.sh:

bash dist_train.sh

๐Ÿ™ Credit

GETAvatar builds upon several previous works:

We would like to thank the authors for their contribution to the community!

๐ŸŽ“ Citation

If you find this codebase useful for your research, please use the following entry.

@inproceedings{zhang2023getavatar,
    title={GETAvatar: Generative Textured Meshes for Animatable Human Avatars},
    author={Zhang, Xuanmeng and Zhang, Jianfeng and Rohan, Chacko and Xu, Hongyi and Song, Guoxian and Yang, Yi and Feng, Jiashi},
    booktitle={ICCV},
    year={2023}
}