tensorflow / models

Models and examples built with TensorFlow
Other
77k stars 45.78k forks source link

struct2depth: Pre-computed mask, Online refinement and Pretrained Resnet-18 #6302

Closed DeckerDai closed 5 years ago

DeckerDai commented 5 years ago

System information

data_dir=/path/to/kitti/data triplet_list_file=$data_dir/test_files_eigen_triplets.txt triplet_list_file_remains=$data_dir/test_files_eigen_triplets_remains.txt ft_name=kitti

python optimize.py \ --output_dir $prediction_dir \ --data_dir $data_dir \ --triplet_list_file $triplet_list_file \ --triplet_list_file_remains $triplet_list_file_remains \ --ft_name $ft_name \ --model_ckpt $model_ckpt \ --file_extension png \ --size_constraint_weight $size_constraint_weight



### Describe the problem

1. How to generate the segmentation mask for training? Specifically, I want to know which model of Mask RCNN you use? For example, do you use the one from [matterport](https://github.com/matterport/Mask_RCNN)? Which dataset do you use to train this Mask RCNN, Imagenet or MS COCO?

2. What's the set up for online refinement when you run _optimize.py_?  Do you still handle motion during online refinement? 
After running command shown above, the performance of fine-tuned depth prediction is worse than the inference result. I use the [tensorflow model](https://drive.google.com/file/d/1mjb4ioDRH8ViGbui52stSUDwhkGrDXy8/view) trained on KITTI dataset, and evaluation code is the one from [Tinghui Zhou](https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_depth.py). The result is shown in the two tables below. 

  | |abs_rel | sq_rel | rms | log_rms | a1 | a2 | a3 |
  | --- | --- | --- | --- | --- | --- | --- | --- |
  | Online Refine |0.1554 | 1.6079 | 6.0072 | 0.2302 | 0.8049 | 0.9298 | 0.9703 |
  | Inference |0.1452 | 1.1166 | 5.3778 | 0.2183 | 0.8127 | 0.9429 | 0.9779 |

3. Could you provide the pre-trained Resnet-18 model you used to initialize the encoder of dispnet? Or could you tell me which dataset you train this Resnet-18, and how you train this? 
Basically I want to replicate your result, so the exactly same initial point is definitely crucial:)

@VincentCa
@aneliaangelova
All helps or replys are highly appreciated!
Thank you very much!

Best,
VincentCa commented 5 years ago

Hi @DeckerDai

  1. You can use the matterport implementation. More specifically, you can simply use their pre-trained model on MS COCO. For more details on how the masks are structured and how to generate them, please refer to this issue and others.
  2. You can either call inference.py (just static inference, no refinement) or optimize.py (inference with refinement). The latter is compatible with any kind of model. When working with M+R, you will want to handle the motion during the process.
  3. Unfortunately, we can not open-source the pre-trained ResNet model here. But there is a ImageNet-pretrained torch model available that you can convert to a tensorflow checkpoint using torchfile. Make sure to match the expected input distribution to fully leverage the pre-trained weights.

Best, Vincent