yfeng95 / PRNet

Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network (ECCV 2018)
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yao_Feng_Joint_3D_Face_ECCV_2018_paper.pdf
MIT License
4.93k stars 944 forks source link

Checkerboard artifacts, jagged mesh output #34

Open AngryLoki opened 6 years ago

AngryLoki commented 6 years ago

It looks like current network produces jaggy meshed due to using stride=2 convolutions. This problem is described here: https://distill.pub/2016/deconv-checkerboard/ . If I undersand correctly, the suggestion is to replace stride=2 convolutions with x = tf.image.resize_images(x, size=np.array(x.get_shape()[1:3])*2) and stride=1 convolution.

Here are come screenshots to illustrate the problem: net_forward output (normalized to 0-255 range): 232_pos Zoomed: moire

Jagged mesh output: _999 615

knvpk commented 6 years ago

Hi @AngryLoki , I also getting the same problem. did you try the solution you and is that working?

KevinLee752 commented 5 years ago

@AngryLoki Also curious about this issue, so does it work to solve this problem as you said? Thanks.

stohrendorf commented 5 years ago

I'm relatively new to neural networks, but I analyzed this a bit. There should be no problem with the convolutions as you described, because the kernel sizes are dividable by the strides. I built an averaged model of around 80 images of a person with similar expression but different angles/lighting/etc., and it became clear that this is an asymmetric and systematic error. It looks like there's somewhere a problem with the x axis of the source, because the z axis of the model (using a right-handed, vertical z axis system) seems undisturbed. I can only guess, but the culprit here seems to be a mistrained model.

grafik

You can see that the final UV map (top left, visibility mapped to alpha here, source image top right) corresponds to the jagged mesh. When trying to adjust the view to match the source photo, it becomes evident that the erroneous faces mostly disappear, because their normals are orthogonal to the perspective.

grafik

I can only guess now, but it seems that the trained UV map size with 2^16~=65k pixels is just too small to handle 40k vertices, and could be solved by doubling it, so that each vertex has at least 2 samples on each axis (see here for the idea behind this).

SungYK commented 3 years ago

You can simply use a bulr function on the position map to smooth the 3d model. For example

pos = prn.process(img, image_info=img_info)
pos = cv2.blur(pos, (5,5))

The model will more smoothing with a bigger kernel, but lose more details. image image