Closed Youngju-Na closed 8 months ago
Thank you very @YoungjuNa-KR much for your interest in our work. 1) we chose these views such that the distance between the cameras would be maximized while at the same time having a similar distribution of relative camera poses over all scenes. Note that the original mvs papers typically choose reference cameras that lie very close to each other with high overlap of the visible regions. Our goal was to showcase that these methods fail if the distance between the reference images is increased and that combining image-based nerfs with depth predictions improves the model performance.
2) yes we retrained transmvsnet.
3) Transmvsnet predicts confidence scores for its depth predictions. However for our approach we require standard deviations of the depth predictions. By analyzing the relation between the predicted confidence and the deviation between prediction and ground truth depth value, we found a linear correlation. Fitting a polynomial of 1st order yielded the values above.
Hope this helps,
Feel free to reach out if you have any further questions.
Best, Malte
Hi, Thanks for sharing this amazing work. In dense depth estimation, you select the target view and source views by choosing one images from each corners ('tl', 'bl', 'tr', 'br') according to your code
get_target_and_ref_ids function
indtu_yao.py
.Why did you choose this particular way differently from the original MVS papers?
Secondly, did you re-train the TransMVSNet with these target source pairs or did you utilize the pre-trained model from the original TransMVSNet?
Finally, in
_getconf2std(self)
function from dtu.py, where these numbers are came from? I am uncertain with these hard-coded values.Thanks in advance.