zhizdev / sparsefusion

[CVPR 2023] SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction
https://sparsefusion.github.io/
354 stars 14 forks source link

Some questions about the code? #14

Open booker-max opened 1 year ago

booker-max commented 1 year ago

**What a great job this is!

I have some problems, mainly inside EFT

  1. In the following code, what is the shape of data["R"] and why [0] was chosen? In the Using Custom Datasets section of your readme, you mentioned "'R': (B, 3, 3 ) PyTorch3D rotation," so why choose [0]**

target_cameras = PerspectiveCameras(R=data['R'][0],T=data['T'][0],focal_length=data['f'][0],principal_point=data['c'][0],image_size=data['image_size'][0]).cuda(gpu) target_rgb = data['images'][0].cuda(gpu)

2. I don't really understand what sample batch cameras do, what does the code mean by setting render_batch_size=1 and query_idx? I'm not sure I understand. Don't you just need to input context_size, a reference pose and reference_image, and a target_pose to output a target_image?

rand_batch = torch.randperm(len(target_cameras))
batch_idx = rand_batch[:render_batch_size], 
batch_cameras, batch_rgb, batch_mask, input_cameras, input_rgb, input_masks, context_idx = relative_cam(target_cameras, target_rgb, context_size=context_size, query_idx=batch_idx, return_context=True)
zhizdev commented 1 year ago

Hi, thanks for looking at the code.

  1. Right out of the dataloader, the shape of data['R'] is (1, B, 3, 3) since we load one scene per iteration.

  2. Render batch size is the batch size for target_pose and target_image. The forward for relative_cam takes in all cameras and pose from dataloader, and based on batch_idx == query_idx, it picks of the target_image and target_pose. It also randomly selects the context_image and context_pose.

In the output of the last line in your code box can be interpreted as this:

  1. batch_cameras ~ target poses
  2. batch_rgb ~ target images
  3. input_cameras ~ context poses
  4. input_rgb ~ context images