xy-guo / MVSNet_pytorch

PyTorch Implementation of MVSNet
618 stars 93 forks source link

How to reproduce evaluation metrics? #29

Open JeffWang987 opened 2 years ago

JeffWang987 commented 2 years ago

Thank you for sharing this code. However, I have some questions about how to reproduce evaluation metrics. (1) How many GPUs did you use? (2) What is the batch size per GPU? (3) Did you use refinement in MVSNET

I tried 4GPUs(batch size=1) and 4GPUs(batch size=2), the results are as below, there are some gaps to catch up with your results. (PS: I didn't use refinement.)

     Acc        Comp    Overall

4x2: 0.5636 0.5393 0.5514 4x1: 0.6434 0.6790 0.6612

I really appreciate it if you kindly provide some advice.

XYZ-qiyh commented 2 years ago

Hello, @JeffWang987

  1. You can first use the pre-trained model provided by the author to reproduce the evaluation results.

  2. You could try to train the network using only one GPU card to see the difference.

  3. Depth refinement is not used in this project.

zhao-you-fei commented 2 years ago

Hello, @JeffWang987 @xy-guo Where should the code be changed to run on multiple cards? Traceback (most recent call last): File "/home/camellia/zyf/MVSNet_pytorch-master/eval.py", line 302, in save_depth() File "/home/camellia/zyf/MVSNet_pytorch-master/eval.py", line 113, in save_depth outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"]) File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, *kwargs) File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/camellia/zyf/MVSNet_pytorch-master/models/mvsnet.py", line 123, in forward warped_volume = homo_warping(src_fea, src_proj, ref_proj, depth_values) File "/home/camellia/zyf/MVSNet_pytorch-master/models/module.py", line 127, in homo_warping warped_src_fea = F.grid_sample(src_fea, grid.view(batch, num_depth * height, width, 2), mode='bilinear', File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3836, in grid_sample return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners) RuntimeError: CUDA out of memory. Tried to allocate 2.71 GiB (GPU 0; 11.77 GiB total capacity; 6.45 GiB already allocated; 1.92 GiB free; 8.33 GiB reserved in total by PyTorch)

XYZ-qiyh commented 2 years ago

Hello @zhao-you-fei You can use the CasMVSNet to achieve the multi-view depth estimation, which achieve more accurate depth estimation and less GPU memory consumption. Besides, CasMVSNet supports multi GPU training.

zhao-you-fei commented 2 years ago

@XYZ-qiyh 谢谢,我知道(虽然casmvsnet这个网络也是卡在了可视化的地方),但我想知道这个mvsnet-pytorch版本的是必须单卡跑吗?我只有两个3060的卡,想直接加载预训练模型测试一下,结果报了以上错误.我想做个对比试验,目前这个问题还没有解决.请问你有什么办法吗 #3 这个问题里的回答我也都试过了,还是会出现同样的错误,我不知道到底是内存不足还是两张卡参数分配不均导致的.请原谅我英文很差只能中文交流了.再次感谢您的回答

nie-lang commented 1 year ago

@JeffWang987 @XYZ-qiyh @xy-guo Hi, have you addressed this problem?

I can reproduce the evaluation results using the pre-trained model. However, when I re-train the model myself (bs=4, a single GPU of 3090Ti, without depth refinement), the results are much worse as follows: Acc --- Comp --- Overall 0.6212 - 0.4912 - 0.5562

Can you provide some suggestions to solve this issue?