Open AngryLoki opened 6 years ago
Hi @AngryLoki , I also getting the same problem. did you try the solution you and is that working?
@AngryLoki Also curious about this issue, so does it work to solve this problem as you said? Thanks.
I'm relatively new to neural networks, but I analyzed this a bit. There should be no problem with the convolutions as you described, because the kernel sizes are dividable by the strides. I built an averaged model of around 80 images of a person with similar expression but different angles/lighting/etc., and it became clear that this is an asymmetric and systematic error. It looks like there's somewhere a problem with the x axis of the source, because the z axis of the model (using a right-handed, vertical z axis system) seems undisturbed. I can only guess, but the culprit here seems to be a mistrained model.
You can see that the final UV map (top left, visibility mapped to alpha here, source image top right) corresponds to the jagged mesh. When trying to adjust the view to match the source photo, it becomes evident that the erroneous faces mostly disappear, because their normals are orthogonal to the perspective.
I can only guess now, but it seems that the trained UV map size with 2^16~=65k pixels is just too small to handle 40k vertices, and could be solved by doubling it, so that each vertex has at least 2 samples on each axis (see here for the idea behind this).
You can simply use a bulr function on the position map to smooth the 3d model. For example
pos = prn.process(img, image_info=img_info)
pos = cv2.blur(pos, (5,5))
The model will more smoothing with a bigger kernel, but lose more details.
It looks like current network produces jaggy meshed due to using stride=2 convolutions. This problem is described here: https://distill.pub/2016/deconv-checkerboard/ . If I undersand correctly, the suggestion is to replace stride=2 convolutions with
x = tf.image.resize_images(x, size=np.array(x.get_shape()[1:3])*2)
and stride=1 convolution.Here are come screenshots to illustrate the problem: net_forward output (normalized to 0-255 range): Zoomed:
Jagged mesh output: