Hello, I have some questions regarding the model training with the KITTI dataset

noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

MIT License

540 stars 61 forks source link

Hello, I have some questions regarding the model training with the KITTI dataset #144

Closed andrew403 closed 4 months ago

andrew403 commented 4 months ago

Hello, I tried to reproduce your model and everything went well, but one thing I don't understand is that since you mentioned in your paper that this is a monocular depth estimation model, what is the purpose using both image_02 and image_03 in the training process. To be more specific, I notice that both folders were opened while training, but I'm not sure what's the exact purpose of this. Besides, could you also elaborate the role of velodyne points in the training process? Thanks.

noahzn commented 4 months ago

Hello, the training uses adjacent images captured by a single camera. So we can use both the left and the right camera for training. But we don't mix the images from the left and right cameras. We don't use any point clouds in the training process.

andrew403 commented 4 months ago

Thank you so much for your quick response! I tried to remove the velodyne_points folder but it turned out the program was still looking for the point clouds files, did I overlook any settings? May I also ask what is the purpose of loading these binary files?

noahzn commented 4 months ago

It is used to generate the depth ground-truth for evaluation.

andrew403 commented 4 months ago

That would explain it, thanks a lot for answering my question!