lmb-freiburg / demon

DeMoN: Depth and Motion Network
https://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet/
GNU General Public License v3.0
574 stars 156 forks source link

what is the best practice for panorama? #9

Open zebrajack opened 7 years ago

zebrajack commented 7 years ago

Panorma has more context, but what is the best practice to get the depth map for 2 paired panorama? should we change it to cubemap and estimate each face and stitch them together.. NOt sure, if the individual face is consistent with each other's result. Or should we consider directly use the distorted panorama for training?

image

benjaminum commented 7 years ago

I think consistency for the depth needs to be enforced for both mappings. This could be done with a special loss on the respective borders.

In the beginning of this project we worked with a omnidirectional camera to test if the network can estimate motion. Motion estimation worked with the equirectangular projection and with the raw fisheye images. I guess the network can also deal with both projections for depth.

zebrajack commented 7 years ago

cool.Equirectangular projection(or raw fisheye) has distortion, so that means the training will take the distortion into consideration automatically? so reuse the current model and parameter will quicken the equirectangular projection training? In terms of training under this circumstance, should will have corresponding full panorama depth image as well?syntheic dataset is easy to achieve. but real indoor or outdoor dataset is labor-intensive to get the panorama depth through registration 3d point clouds.Besides that, if I use 2 fisheye camera,like Ricoh thetaS, I still needs to get the intrisinc parameter?

If I want to make things easier and resue your code directly, Can I use cylindrical projection, though it means discarding sky/ceiling and ground.

Sorry for too many question.

benjaminum commented 7 years ago

We did not experiment with depth prediction, just camera motion estimation and in that case a simple feed forward network was able to deal with the distortions. This does not mean that this is the best representation.

In case you want to adapt the DeMoN architecture you need to change some of the ops to the new projection type (e.g. the depth to flow op assumes the pinhole camera model). Alternatively, you could do the cube mapping as you suggested and reuse the existing pinhole camera model code. For training of course some sort of depth ground truth is needed, which is unfortunately not easy to obtain for real scenes.