shunsukesaito / PIFu

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"
https://shunsukesaito.github.io/PIFu/
Other
1.76k stars 341 forks source link

input format for multi-view PIFu #45

Closed alitokur closed 4 years ago

alitokur commented 4 years ago

Dear Sir; For single view Pifu training, I almost got the data format: [image_tensor -> our images( 4-dim ) sample_tensor -> for mesh points, calib_tensor , labels (1 for inside points zero for the outsides)]

But I didn't find any solution for multi-view format. Should I create an image_tensor parameter for each view? #31 here, u mentioned about transform matrixes. When we creating data with "apps.render_data" we just rotate the object, right? Should I find the rotation matrices between the images and give them to the function? Can you give more information about data format when you are available? Sincerely

shunsukesaito commented 4 years ago

You can take a look at here https://github.com/shunsukesaito/PIFu/blob/30b428ba74bd7743a17c19fa20f6bfd39b1de057/apps/train_shape.py#L98 and the definition of function itself to get a sense of how the different view points are fed into the network.

alitokur commented 4 years ago

actually, i have been reading train_shape.py for a long time, but i can not make any progress nowadays. to understand the code, i run it step by step. our default bahtc_size = 2, Sir. and when epoch:0, train_idx=0 my image_tensor is as follows: Screenshot from 2020-07-10 14-12-26

I currently have two random images(different_view) in my image_tensor, right?

shunsukesaito commented 4 years ago

Correct. For multi-view case, your data loader needs to return number of views x C x H x W image tensor for each item.

alitokur commented 4 years ago

so when the image_tensor has multiple image, it means we already train the network with different-view images? it may be a stupid question, forgive me!

shunsukesaito commented 4 years ago

Feature aggregation across different view points won't happen when setting number of views = 1, so simply batching images with batch_size, you simply train the network with a single image (but can be any view point).

alitokur commented 4 years ago

i guest, setting batch_size = 1 and number_views=2 is not smart solution. well, my image_tensor has two images again but; error: UserWarning: Using a target size (torch.Size([1, 1, 5000])) that is different to the input size (torch.Size([2, 1, 5000])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

shunsukesaito commented 4 years ago

Before passing the image tensor (B, N, 3, H, W) to the network, you have to reshape it to (BN, 3, H, W). Accordingly, your in/out label should have the same batch size as BN, which is one required modification to support multi-view integration.