shunsukesaito / PIFu

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"
https://shunsukesaito.github.io/PIFu/
Other
1.76k stars 341 forks source link

Sampling for feature vector vs ground truth mesh #39

Closed gordon-lim closed 4 years ago

gordon-lim commented 4 years ago

Hi. I am having some trouble with my understanding. I hope you can enlighten and will truly appreciate it! You use spatial sampling with the ground truth mesh. Does this not mean you have a ground truth 3d occupancy field that is incomplete? For corresponding pixels without a groundtruth inside/outside prediction, will their feature vectors and z-values still be fed through PIFu?

I noticed in an earlier paragraph that you mentioned the use of bilinear sampling to obtain the feature vectors. How is the purpose of this sampling different from spatial sampling used with the ground truth meshes?

I also checked out the script you attached in the previous issue. surface_points, _ = trimesh.sample.sample_surface(mesh, 4 * self.num_sample_inout) sample_points = surface_points + np.random.normal(scale=self.opt.sigma, size=surface_points.shape) Is this above spatial sampling? I do not see a direct connection to the spatial sampling described in the paper so I just want to confirm.

Thank you for you patience. I have picked this paper to try and learn as much as possible. Hopefully I'm not annoying you. I look forward to your reply.

shunsukesaito commented 4 years ago

I guess you are confused between sampling of 3d points and sampling of image features. What you mentioned above is basically sampling of 3d points for training. During training, we sample points around ground truth meshes, obtain GT occupancy labels in the data loader, and supervise PIFu prediction with these labels. We found that the final reconstruction quality is highly influenced by this sampling strategy. Please refer to the supplemental material for our ablation study on this.

And image feature sampling (bilinear sampling) has nothing to do with this ground truth data sampling above. In PIFu, we query scalar/vector fields at arbitrary points. To do so, we combine localized image feature based on 2D camera projection of 3d points and z value. The bilinear sampling appears in this image feature extraction process. Please refer to the query function in https://github.com/shunsukesaito/PIFu/blob/30b428ba74bd7743a17c19fa20f6bfd39b1de057/lib/model/HGPIFuNet.py#L68 for details.

gordon-lim commented 4 years ago

There's a comment under the query method that says:

Image features should be pre-computed before this call.

So it seems this isn't the code for the image feature extraction. I also could not match the code with what I found online with regards to bilinear resampling.

I did a search on bilinear sampling and got results for bilinear interpolation and bilinear resampling that got to do with how pixels are filled/removed when making images bigger or smaller. Is this relevant? I recognise that you are using a continuous space instead of pixels. Is bilinear sampling used to get a "pixel value" where the coordinate is not originally on a pixel?

shunsukesaito commented 4 years ago

Note that "computation" of image features through fully convoulutional networks and "extraction" of this feature are different steps.

I did a search on bilinear sampling and got results for bilinear interpolation and bilinear resampling that got to do with how pixels are filled/removed when making images bigger or smaller. Is this relevant?

Algorithm-wise bilinear interpolation and sampling do similar things, but the focus of bilinear sampling is to extract pixel values based on non-discretized pixel coordinates (already normalized to [-1, 1]) using the bilinear interpolation scheme. So resizing is just one application of it but can be used in various scenarios like PIFu.

I recognise that you are using a continuous space instead of pixels. Is bilinear sampling used to get a "pixel value" where the coordinate is not originally on a pixel?

I think you are right. You can refer to https://en.wikipedia.org/wiki/Bilinear_interpolation#:~:text=Bilinear%20interpolation%20is%20performed%20using,quadratic%20in%20the%20sample%20location. to get a better sense of how these non-discretized coordinates are used to extract "pixel value".

gordon-lim commented 4 years ago

I have checked out the wikipedia page.

I'm not sure if I'm oversimplifying things but... if a non-discretized coordinate falls within the aligned pixel, why not just take that aligned pixel's value? What's the need for billinear sampling then?

shunsukesaito commented 4 years ago

That's an option too (if you use mode='nearest', that's exactly what you said). However, this way reconstruction tends to be blocky as the feature sampling is discontinuous between pixels. Bilinear sampling makes it C0 continuous to alleviate these blocky artifacts.

gordon-lim commented 4 years ago

use mode='nearest' Did you use nn.Upsample ?

shunsukesaito commented 4 years ago

No. Please take a look at here. https://github.com/shunsukesaito/PIFu/blob/30b428ba74bd7743a17c19fa20f6bfd39b1de057/lib/geometry.py#L15

gordon-lim commented 4 years ago

Thank you very much!