struct2depth: Pre-computed mask, Online refinement and Pretrained Resnet-18

System information

What is the top-level directory of the model you are using: tensorflow/models/struct2depth/
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Small changes have been made to make sure the code could be run without seg_mask (details below)
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian GNU/Linux 9 (stretch)
TensorFlow installed from (source or binary): anaconda
TensorFlow version (use command below): 1.11.0
Bazel version (if compiling from source): No
CUDA/cuDNN version: release 9.0, V9.0.176
GPU model and memory: Titan Xp, 12G

Exact command to reproduce: below is the command I run optimize.py


prediction_dir=/path/to/inference/result
model_ckpt=/path/to/ckpt
handle_motion="false"
size_constraint_weight="0"

data_dir=/path/to/kitti/data triplet_list_file=$data_dir/test_files_eigen_triplets.txt triplet_list_file_remains=$data_dir/test_files_eigen_triplets_remains.txt ft_name=kitti

python optimize.py \ --output_dir $prediction_dir \ --data_dir $data_dir \ --triplet_list_file $triplet_list_file \ --triplet_list_file_remains $triplet_list_file_remains \ --ft_name $ft_name \ --model_ckpt $model_ckpt \ --file_extension png \ --size_constraint_weight $size_constraint_weight



### Describe the problem

1. How to generate the segmentation mask for training? Specifically, I want to know which model of Mask RCNN you use? For example, do you use the one from [matterport](https://github.com/matterport/Mask_RCNN)? Which dataset do you use to train this Mask RCNN, Imagenet or MS COCO?

2. What's the set up for online refinement when you run _optimize.py_?  Do you still handle motion during online refinement? 
After running command shown above, the performance of fine-tuned depth prediction is worse than the inference result. I use the [tensorflow model](https://drive.google.com/file/d/1mjb4ioDRH8ViGbui52stSUDwhkGrDXy8/view) trained on KITTI dataset, and evaluation code is the one from [Tinghui Zhou](https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_depth.py). The result is shown in the two tables below. 

  | |abs_rel | sq_rel | rms | log_rms | a1 | a2 | a3 |
  | --- | --- | --- | --- | --- | --- | --- | --- |
  | Online Refine |0.1554 | 1.6079 | 6.0072 | 0.2302 | 0.8049 | 0.9298 | 0.9703 |
  | Inference |0.1452 | 1.1166 | 5.3778 | 0.2183 | 0.8127 | 0.9429 | 0.9779 |

3. Could you provide the pre-trained Resnet-18 model you used to initialize the encoder of dispnet? Or could you tell me which dataset you train this Resnet-18, and how you train this? 
Basically I want to replicate your result, so the exactly same initial point is definitely crucial:)

@VincentCa
@aneliaangelova
All helps or replys are highly appreciated!
Thank you very much!

Best,

Hi @DeckerDai

You can use the matterport implementation. More specifically, you can simply use their pre-trained model on MS COCO. For more details on how the masks are structured and how to generate them, please refer to this issue and others.
You can either call inference.py (just static inference, no refinement) or optimize.py (inference with refinement). The latter is compatible with any kind of model. When working with M+R, you will want to handle the motion during the process.
Unfortunately, we can not open-source the pre-trained ResNet model here. But there is a ImageNet-pretrained torch model available that you can convert to a tensorflow checkpoint using torchfile. Make sure to match the expected input distribution to fully leverage the pre-trained weights.

Best, Vincent

tensorflow / models

struct2depth: Pre-computed mask, Online refinement and Pretrained Resnet-18 #6302

System information