Resolution of the images used in ETH3D High-res benchmark

jzhangbs / Vis-MVSNet

Visibility-aware Multi-view Stereo Network

MIT License

235 stars 27 forks source link

Resolution of the images used in ETH3D High-res benchmark #30

Closed leejaeyong7 closed 2 years ago

leejaeyong7 commented 2 years ago

Hi,

Thanks for sharing the great work. I was wondering what the image resolution was used in the inference of the Vis-MVSNet in ETH3D high-res evaluation?

Thanks!

jzhangbs commented 2 years ago

Hi, input images are 2400x1600, output depths are 1200x800

leejaeyong7 commented 2 years ago

Thanks!

TruongKhang commented 2 years ago

@jzhangbs can you provide the running script with specific parameters (probability thresholds, number of consistent views, etc...)?

jzhangbs commented 2 years ago

Hi @TruongKhang

depth inference: --num_src 20 \ --max_d 512 \ --interval_scale \ --cas_depth_num 128,64,32 \ --cas_interv_scale 4,2,1 \ --resize 2400,1600 \ --crop 2400,1600 \

depth fusion: --view 20 \ --vthresh 2 \ --pthresh 0.1,0.1,0 \

you need to prepare a pair.txt where each image has more than 20 source views.

TruongKhang commented 2 years ago

@jzhangbs thank you so much for your response! Can you provide your preprocessed pair.txt file? Because the number of views for each scene in the ETH3D high-resolution dataset is relatively small, even some scenes have less than 10 views.

xy-guo commented 2 years ago

I tried your depth inference arguments. I met the out-of-memory problem when I run depth inference on a 3090 GPU. It seems num_src=20 makes the model consume a huge amount of GPU memory. Is it possible to set num_src to a smaller value?

I was only able to train the model when I set num_src to a number smaller than 4.

jzhangbs commented 2 years ago

@TruongKhang Terribly sorry for the delay because github didn't notice me the reply.

The general dataloader should support the situation where the total number of sources in pair.txt is less than the number of sources in the argument. In this case the program will use all the available views. And you can use the improved fusion code which also supports this situation: https://github.com/jzhangbs/pcd-fusion

jzhangbs commented 2 years ago

@xy-guo Try reducing the spatial resolution or depth numbers. In inference we calculate the cost volumes one by one so this part does not grow with the number of sources.