lhoyer / improving_segmentation_with_selfsupervised_depth

[CVPR21] Implementation of our work "Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation"
247 stars 30 forks source link

Corrected Model based on Correct Intrinsics #11

Closed nbansal90 closed 3 years ago

nbansal90 commented 3 years ago

Hey @lhoyer !

I was wondering if you got a chance to get a new model based on the corrected intrinsic as discussed in #8. Meanwhile I was actually looking to run the following steps, after making the changes on the intrinsics, from my side.For which I am following these steps as mentioned in #8.

  1. If they already exist, delete the downloaded models in the model folder. (Done)
  2. You train self-supervised depth estimation with a frozen encoder initialized from ImageNet python train.py --machine ws --config configs/cityscapes_monodepth_highres_dec5_crop.yml (Done)
  3. You upload the result folder to google drive and adapt https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth/blob/master/models/utils.py#L108 that "mono_cityscapes_1024x512_r101dil_aspp_dec5_posepretrain_crop512x512bs4" points to your own model on google drive. (Not Sure how I do this step)

After completion of step 2 , i get the model in my following folder : results/monodepth/cityscapes-monodepth-101aspp-dec5-crop/r101_crop_512x512_batch4_posepretrain_freezebnFalse named best_model.pkl.

I had trouble understanding this part:

  1. If I simply by-pass the function download_model_if_doesnt_exist, and use the best_model.pkl in function get_depth_decoder and get_posenet, by making following changes :

In function get_posenet:

if "mono" in pose_pretraining:
    for mn in ["pose_encoder", "pose"]:
        if mn not in models:
            continue
        #download_model_if_doesnt_exist(pose_pretraining) <<<<<<<<<< REMOVED
        #path = os.path.join(MachineConfig.DOWNLOAD_MODEL_DIR, pose_pretrain ing, "{}.pth".format(mn)) <<<< REMOVED
        path = os.path.join(MachineConfig.DOWNLOAD_MODEL_DIR, pose_pretraining, 'best_model.pkl')
        loaded_dict = torch.load(path, map_location=torch.device(device))
        filtered_dict = {k: v for k, v in loaded_dict.items() if k in models[mn].state_dict()}
        models[mn].load_state_dict(filtered_dict)

return models

I get the following error: RuntimeError: Error(s) in loading state_dict for ResnetEncoder: Missing key(s) in state_dict: "encoder.conv1.weight", "encoder.bn1.weight", "encoder.bn1.bias", "encoder.bn1.running_mean", "encoder.bn1.running_var", "encoder.layer1.0.conv1.weight", "encoder.layer1.0.bn1.weight", "encoder.layer1.0.bn1.bias", "encoder.layer1.0.bn1.running_mean", "encoder.layer1.0.bn1.running_var", "encoder.layer1.0.conv2.weight", "encoder.layer1.0.bn2.weight", "encoder.layer1.0.bn2.bias", "encoder.layer1.0.bn2.running_mean", "encoder.layer1.0.bn2.running_var", "encoder.layer1.1.conv1.weight", "encoder.layer1.1.bn1.weight", "encoder.layer1.1.bn1.bias", "encoder.layer1.1.bn1.running_mean", "encoder.layer1.1.bn1.running_var", "encoder.layer1.1.conv2.weight", "encoder.layer1.1.bn2.weight", "encoder.layer1.1.bn2.bias", "encoder.layer1.1.bn2.running_mean", "encoder.layer1.1.bn2.running_var", "encoder.layer2.0.conv1.weight", "encoder.layer2.0.bn1.weight", "encoder.layer2.0.bn1.bias", "encoder.layer2.0.bn1.running_mean", "encoder.layer2.0.bn1.running_var", "encoder.layer2.0.conv2.weight", "encoder.layer2.0.bn2.weight", "encoder.layer2.0.bn2.bias", "encoder.layer2.0.bn2.running_mean", "encoder.layer2.0.bn2.running_var", "encoder.layer2.0.downsample.0.weight", "encoder.layer2.0.downsample.1.weight", "encoder.layer2.0.downsample.1.bias", "encoder.layer2.0.downsample.1.running_mean", "encoder.layer2.0.downsample.1.running_var", "encoder.layer2.1.conv1.weight", "encoder.layer2.1.bn1.weight", "encoder.layer2.1.bn1.bias", "encoder.layer2.1.bn1.running_mean", "encoder.layer2.1.bn1.running_var", "encoder.layer2.1.conv2.weight", "encoder.layer2.1.bn2.weight", "encoder.layer2.1.bn2.bias", "encoder.layer2.1.bn2.running_mean", "encoder.layer2.1.bn2.running_var", "encoder.layer3.0.conv1.weight", "encoder.layer3.0.bn1.weight", "encoder.layer3.0.bn1.bias", "encoder.layer3.0.bn1.running_mean", "encoder.layer3.0.bn1.running_var", "encoder.layer3.0.conv2.weight", "encoder.layer3.0.bn2.weight", "encoder.layer3.0.bn2.bias", "encoder.layer3.0.bn2.running_mean", "encoder.layer3.0.bn2.running_var", "encoder.layer3.0.downsample.0.weight", "encoder.layer3.0.downsample.1.weight", "encoder.layer3.0.downsample.1.bias", "encoder.layer3.0.downsample.1.running_mean", "encoder.layer3.0.downsample.1.running_var", "encoder.layer3.1.conv1.weight", "encoder.layer3.1.bn1.weight", "encoder.layer3.1.bn1.bias", "encoder.layer3.1.bn1.running_mean", "encoder.layer3.1.bn1.running_var", "encoder.layer3.1.conv2.weight", "encoder.layer3.1.bn2.weight", "encoder.layer3.1.bn2.bias", "encoder.layer3.1.bn2.running_mean", "encoder.layer3.1.bn2.running_var", "encoder.layer4.0.conv1.weight", "encoder.layer4.0.bn1.weight", "encoder.layer4.0.bn1.bias", "encoder.layer4.0.bn1.running_mean", "encoder.layer4.0.bn1.running_var", "encoder.layer4.0.conv2.weight", "encoder.layer4.0.bn2.weight", "encoder.layer4.0.bn2.bias", "encoder.layer4.0.bn2.running_mean", "encoder.layer4.0.bn2.running_var", "encoder.layer4.0.downsample.0.weight", "encoder.layer4.0.downsample.1.weight", "encoder.layer4.0.downsample.1.bias", "encoder.layer4.0.downsample.1.running_mean", "encoder.layer4.0.downsample.1.running_var", "encoder.layer4.1.conv1.weight", "encoder.layer4.1.bn1.weight", "encoder.layer4.1.bn1.bias", "encoder.layer4.1.bn1.running_mean", "encoder.layer4.1.bn1.running_var", "encoder.layer4.1.conv2.weight", "encoder.layer4.1.bn2.weight", "encoder.layer4.1.bn2.bias", "encoder.layer4.1.bn2.running_mean", "encoder.layer4.1.bn2.running_var".

Do you suggest some other workaround/solution to the issue ?

nbansal90 commented 3 years ago

Hey @lhoyer,

As per instructions shared by you in #3

  1. If they already exist, delete the downloaded models in the model folder.
  2. You train self-supervised depth estimation with a frozen encoder initialized from ImageNet python train.py --machine ws --config configs/cityscapes_monodepth_highres_dec5_crop.yml

I see that as I launch this job, a model is downloaded which is mentioned in the config file.

https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth/blob/e6b602922ed280d748f876ee390c6e6a790cee17/configs/cityscapes_monodepth_highres_dec5_crop.yml#L17

This file contains pose, depth and pose_encoder model files. How do we get this pertained files, since python train.py --machine ws --config configs/cityscapes_monodepth_highres_dec5_crop.yml is the first step in the whole training process?

Regards, Nitin Bansal

lhoyer commented 3 years ago

I was wondering if you got a chance to get a new model based on the corrected intrinsic as discussed in #8.

The code with the intrinsic fix is available on the ssda branch. The procedure for launching the training is slightly different. Please have a look at the README.md on the ssda branch for more information. Also note that the SDE training on the ssda branch is only done for GTA and Synthia and no ImageNet feature distance is applied. However, this can be changed in experiments.py#L156 and configs/sde_dec11.yml#L78.

path = os.path.join(MachineConfig.DOWNLOAD_MODEL_DIR, pose_pretrain ing, "{}.pth".format(mn)) <<<< REMOVED

path = os.path.join(MachineConfig.DOWNLOAD_MODEL_DIR, pose_pretraining, 'best_model.pkl')

This change is the reason for the errors you get. The training should save pose.pth, pose_encoder.pth, encoder.pth, depth.pth, which are used to initialize the corresponding networks. The relevant code is located in save_monodepth_models() in train.py#L377.

How do we get this pertained files, since python train.py --machine ws --config configs/cityscapes_monodepth_highres_dec5_crop.yml is the first step in the whole training process?

If you also want to reproduce the initialization of the pose network trained on uncropped images, you can run python train.py --machine ws --config configs/cityscapes_monodepth_highres_dec5.yml.