2023 PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°

Introduction

https://sizhean.github.io/panohead This paper proposes a generative model based on NeRF that can generate the 3D consistent human head. The previous research, for example ED3G or styleNeRF can only synthesize nearly frontal face and fail to synthesize the back head image. This paper propose the "tri-grid feature plane", "Tri-Discrimination" and "self-adaptive camera alignment" to improve the results.

Method

The method is based on EG3D. The tri-plane feature maps are generated from the random latent Z and camera pose vector through the StyleGAN2 backbone. The tri-plane features are then rendered into the 2D image via NeRF-liked process. Finally a super resolution block map the low-res image to the final image. There are three major modification

Foreground-Aware Tri-Discrimination: This paper used an extra StyleGAN to generate background and blend this background to the rendered feature with the rendered foreground mask. Also, the input to the discriminator contains the low-res image, final image and the rendered foreground mask.
Tri-grid feature plane: The author mentioned EG3D are suffered from the mirrored face artifacts, since the back head and front head are projected to the same XY plane. As a results, this paper proposed using the interpolation of the multiple XY planes to solve this ambiguity.
Self-adaptive camera alignment: Since it is unable to predict the landmarks from the side or back head, the paper construct the alignment based on the object detection bounding box. It also introduced a per-image camera offset to resolve the incorrect prediction of the camera position.

This paper used FFHQ with 4K back-head images from K-hairstyle dataset and 15K in-house large-pose images with diverse styles, ranging from 60 to 180 degrees to form the training dataset, called FFHQ-F.

Highlight

The results are amazing. It is the first generative method that generate the clear full head model.
The inference speed is similar to the EG3
Able to inverse the real image(via PTI)

Limitation

still fail in the teeth
still fail in the eyeball since the geometry here is not distinguishable

pomelyu / paper-reading-notes