nianticlabs / simplerecon

[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
Other
1.31k stars 121 forks source link

Using different num of views in a tuple #34

Open Twilight89 opened 1 year ago

Twilight89 commented 1 year ago

Hi, thanks for your wonderful job and readily code!

In my case, I want to use arbitrary views in a tuple, like from 2 to 8. And I have noticed that in your original paper, you did the experiment to test the influence of view num.

So do I have to retrain a model to suit one particular view num in a tuple? Or I just need one model like HERO_MODEL to test different _num_images_intuple.

I try to directly change the model_num_views in options.py【of course I generate data_split file with num_images_in_tuple: 2】, but when I run, the terminal shows that Number of source views: 7 and I got shape error like below.

########################## Using FeatureVolumeManager ########################## Number of source views: 7
Using all metadata.
Number of channels: [202, 128, 128, 1]
################################################################################

0%| | 0/37 [00:27<?, ?it/s] 0%| | 0/1 [00:27<?, ?it/s] Traceback (most recent call last): File "/root/simplerecon/test.py", line 473, in main(opts) File "/root/simplerecon/test.py", line 270, in main outputs = model( File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/root/simplerecon/experiment_modules/depth_model.py", line 361, in forward cost_volume, lowestcost, , overall_mask_bhw = self.cost_volume( File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/simplerecon/modules/cost_volume.py", line 360, in forward self.build_cost_volume( File "/root/simplerecon/modules/cost_volume.py", line 727, in build_cost_volume feature_b1hw = self.mlp( File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/simplerecon/modules/networks.py", line 147, in forward return self.net(x) File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (393216x46 and 202x128)

So I assume that for different view num, I have to train a particular model to suit the size, is that right?

Hope to hear from you! THX:)

mohammed-amr commented 1 year ago

The "hero model" with metadata that we provide only supports 7 source views (8 views total). To use the repo with another number of source views you'll have to either

  1. train other metadata models with the number of source views you want to have,
  2. use the dot product model as the correlation is agnostic to number of views,
  3. or use the hero model but duplicate the remaining source view indices using what you have in the tuple.

Here's a longer explanation for 3. Say you have generated a tuple of size 4, but your model uses 8. Duplicate source views using the three you already have. Concretely:

Tuple of four:
frame_99, frame_80, frame_76, frame_70

For your tuple of eight, just randomly duplicate source views from the tuple of four: frame_99, frame_80, frame_76, frame_70. frame_70, frame_76, frame_76. frame_80

The scripts for tuple generation do some version of this duplication for test tuples when the number of tuples is in the list isn't enough. You can insert a little flag in there to force this behavior. Change n_measurement_frames at this line to three for three source views (four views in total). The block at this line will perform the random sample repeat for you.

Twilight89 commented 1 year ago

Thanks for your detailed and early reply! I have tried using the dot product model and successfully generate pred_depth with 2 images in a tuple. So the model contains two parts: metadata and 2D CNN, right? But if want to train a new HERO_MODEL, I also have to train both parts since it is end-to-end?

mohammed-amr commented 1 year ago

Great!

For the hero model, yes. You'd need to retrain the entire model end to end.

mohammed-amr commented 1 year ago

To elaborate on that. The feature volume's MLP does collapse features down to a single value, and from all my experiments, the MLP does end up learning some form of higher is better score.

Maybe you can get away with training a 4 view volume with a frozen 8 view decoder network? You'd need to try it. Do let us know if you do! Would be cool to learn from that.

Twilight89 commented 1 year ago

@mohammed-amr Hi, thanks again for your elaboration. I'll try to do that! I also want to try whether simplerecon can run on a mobile device (as you say in the paper, it is potentially enabling use in embedded and resource-constrained environments). What I want to do is to lower the memory usage and inference time. That's why I want to try 2 views in a tuple. There are two questions for me now:

  1. According to your experiments, do you think it can run on mobiles in real-time with minimal cost (like just using two views)?
  2. And what about the generalization of different input image sizes? For example, 360640 (512384 in your implementation). I have tried to use the images captured by my iPhone(360640) to run. The images are resized to 512384 before sending to the model. The depth seems quite OK but the fused mesh is bad.

Could you give me some suggestions about them? I will try my best to it !