noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
540 stars 61 forks source link

model reproduction #143

Closed gggxxx1234 closed 4 months ago

gggxxx1234 commented 4 months ago

I followed your experiments for training, but the generalisability experiments on the Make3D dataset could not achieve the original paper results.

noahzn commented 4 months ago

Hi, I cannot give you any advice if you don't tell me more information.

You can check some previous issues related to the results on make3d https://github.com/noahzn/Lite-Mono/issues/73 https://github.com/noahzn/Lite-Mono/issues/128

gggxxx1234 commented 4 months ago

The prediction code I used is shown below, in that the weights you posted can be tested out of the same results in the paper, but I couldn't come up with the same results with my own training.

from layers import disp_to_depth import networks import cv2 import os import torch from scipy import io import numpy as np from options import LiteMonoOptions

cv2.setNumThreads(0) # This speeds up evaluation 5x on our unix systems (OpenCV 3.3.1)

Path

load_weights_folder = 'pre_tmp/pre-litemono8m-real-Shift5-LGMaSA/models/weights_26'

load_weights_folder = 'lite-mono_640x192'

main_path = 'make3d' encoder_path = os.path.join(load_weights_folder, "encoder.pth") decoder_path = os.path.join(load_weights_folder, "depth.pth")

def compute_errors(gt, pred): rmse = (gt - pred) ** 2 rmse = np.sqrt(rmse.mean())

rmse_log = (np.log10(gt) - np.log10(pred)) ** 2
rmse_log = np.sqrt(rmse_log.mean())

abs_rel = np.mean(np.abs(gt - pred) / gt)

sq_rel = np.mean(((gt - pred) ** 2) / gt)

return abs_rel, sq_rel, rmse, rmse_log

def evaluate(opt): print("-> Loading weights from {}".format(load_weights_folder))

encoder_path = os.path.join(load_weights_folder, "encoder.pth")
decoder_path = os.path.join(load_weights_folder, "depth.pth")

encoder_dict = torch.load(encoder_path)
decoder_dict = torch.load(decoder_path)

# Load Model (Encoder & Decoder)
encoder = networks.LiteMono(model=opt.model,
                            height=encoder_dict['height'],
                            width=encoder_dict['width'])
depth_decoder = networks.DepthDecoder(encoder.num_ch_enc, scales=range(3))
model_dict = encoder.state_dict()
depth_model_dict = depth_decoder.state_dict()
encoder.load_state_dict({k: v for k, v in encoder_dict.items() if k in model_dict})
depth_decoder.load_state_dict({k: v for k, v in decoder_dict.items() if k in depth_model_dict})

# CUDA
encoder.cuda()
encoder.eval()
depth_decoder.cuda()
depth_decoder.eval()

# Load Dataset
with open(os.path.join(main_path, "make3d_test_files.txt")) as f:
    test_filenames = f.read().splitlines()
test_filenames = map(lambda x: x[4:], test_filenames)

depths_gt = []
images = []
ratio = 2  # 2
h_ratio = 1 / (1.33333 * ratio)
color_new_height = int(1704 / 2)  # 2
depth_new_height = 21
for filename in test_filenames:
    mat = io.loadmat(os.path.join(main_path, "Gridlaserdata", "depth_sph_corr-{}.mat".format(filename)))
    depths_gt.append(mat["Position3DGrid"][:, :, 3])

    image = cv2.imread(os.path.join(main_path, "Test134", "img-{}.jpg".format(filename)))
    image = image[int((2272 - color_new_height) / 2):int((2272 + color_new_height) / 2), :, :]
    images.append(image[:, :, ::-1])
depths_gt_resized = map(lambda x: cv2.resize(x, (305, 407), interpolation=cv2.INTER_NEAREST), depths_gt)
depths_gt_cropped = map(lambda x: x[int((55 - 21) / 2):int((55 + 21) / 2), :], depths_gt)

depths_gt_cropped = list(depths_gt_cropped)
print("-> Computing predictions with size {}x{}".format(
    encoder_dict['width'], encoder_dict['height']))
errors = []
with torch.no_grad():
    for i in range(len(images)):
        input_color = images[i]
        input_color = cv2.resize(input_color / 255.0, (640, 192), interpolation=cv2.INTER_NEAREST)  # <----1
        # input_color = (lambda x: x[int((55 - 21) / 2):int((55 + 21) / 2), :], input_color)
        input_color = torch.tensor(input_color, dtype=torch.float).permute(2, 0, 1)[None, :, :, :]
        input_color = input_color.cuda()
        output = depth_decoder(encoder(input_color))
        pred_disp, _ = disp_to_depth(output[("disp", 2)], 1, 60)  # <=---2   [0.1, 100]
        pred_disp = pred_disp.squeeze().cpu().numpy()
        depth_gt = depths_gt_cropped[i]
        depth_pred = 1 / pred_disp
        depth_pred = cv2.resize(depth_pred, depth_gt.shape[::-1], interpolation=cv2.INTER_NEAREST)
        mask = np.logical_and(depth_gt > 1, depth_gt < 60)  # 0, 70
        depth_gt = depth_gt[mask]
        depth_pred = depth_pred[mask]
        depth_pred *= np.median(depth_gt) / np.median(depth_pred)
        depth_pred[depth_pred > 70] = 70
        errors.append(compute_errors(depth_gt, depth_pred))
        # if errors[i][3] > 0.2:
        #     print(errors[i][3])
        #     print(i)
    mean_errors = np.mean(errors, 0)

print(("{:>8} | " * 4).format("abs_rel", "sq_rel", "rmse", "rmse_log"))
print(("{: 8.3f} , " * 4).format(*mean_errors.tolist()))

if name == "main": options = LiteMonoOptions() evaluate(options.parse())

I trained with the following parameter settings: ![Uploading 捕获.JPG…]()

noahzn commented 4 months ago

Hi, does your trained model achieve a similar result on the evaluation set of KITTI as mine?

gggxxx1234 commented 4 months ago

not having reached: a1 0.7713, a2 0.966, a3 0.9878, abs_rel 0.1666 , log_rms 0.2153 , rme 4.603, sq_rel 0.9125

{ "data_path": "/sdb1/YJ/Lite-Mono-main/kitti", "log_dir": "./pretrain-tmp-refine", "model_name": "pre-litemono-predictmask_nothing", "split": "eigen", "model": "lite-mono", "weight_decay": 0.01, "drop_path": 0.2, "num_layers": 18, "dataset": "kitti", "png": false, "height": 192, "width": 640, "disparity_smoothness": 0.001, "scales": [ 0, 1, 2 ], "min_depth": 0.1, "max_depth": 80.0, "use_stereo": false, "frame_ids": [ 0, -1, 1 ], "profile": true, "batch_size": 1, "lr": [ 0.0001, 5e-06, 31, 0.0001, 1e-05, 31 ], "num_epochs": 30, "scheduler_step_size": 15, "v1_multiscale": false, "avg_reprojection": false, "disable_automasking": false, "predictive_mask": false, "no_ssim": false, "mypretrain": null, "weights_init": "pretrained", "pose_model_input": "pairs", "pose_model_type": "separate_resnet", "no_cuda": false, "num_workers": 12, "load_weights_folder": "pretrain-tmp-refine/pre-litemono-predictmask_nothing/models/weights_29", "models_to_load": [ "encoder", "depth", "pose_encoder" ], "log_frequency": 250, "save_frequency": 1, "disable_median_scaling": true, "pred_depth_scale_factor": 1, "ext_disp_to_eval": null, "eval_split": "eigen_zhou", "save_pred_disps": true, "no_eval": false, "eval_out_dir": "eval_cs", "post_process": false }

gggxxx1234 commented 4 months ago

I gave my parameter configuration and I don't know what my problem is?

noahzn commented 4 months ago

Hi, you didn't use the pretrained weights. You need to set --mypretrain

gggxxx1234 commented 4 months ago

Should I use pre-training weights during training? Or is it here?

gggxxx1234 commented 4 months ago

I'm very sorry to bother you so late, but I can't really confirm why I can't reproduce it during training.

gggxxx1234 commented 4 months ago

If I need to change the model, I can't use pre-training when training.

noahzn commented 4 months ago

Then you can directly change the model architecture, and make your own pre-training weights for training. All the checkpoints are included in this repo. If you don't use the ImageNet pretrained weights, then it's meaningless to compare the results.

noahzn commented 4 months ago

I am now closing this issue due to no response.