Closed Wooho-Moon closed 2 years ago
You could try passing passing identities for the source poses and manually add zeros for source view features. The result might be sharp, but won't have a good metric depth estimate.
If this isn't an edge case in a video sequence, and you're just looking for a relative depth map for a single image, I'd encourage you look for a SOTA monodepth model instead. Here's MIDAS: https://github.com/isl-org/MiDaS
Thanks.
Thanks for reply
Welcome, hope it helped!
At first, thanks for awesome works! I have a quick question. I already read your paper and am impressed on it.
I have a quick question. According to your paper, model take as input reference image, a set of source image, their intrinsics and relative camera pose. If I use only single image as input during inference, could I get similar resluts( e.g. depth map )? I mean , if i don't have any metadata, and just have only single image, Could I use this model?