nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Other
4.18k stars 964 forks source link

Image name error #210

Closed YJonmo closed 4 years ago

YJonmo commented 4 years ago

Hi there,

I created my own data set using the Blender software and the training file looks like this: Movie_Right/New 00000 r Movie_Right/New 00001 r Movie_Right/New 00002 r Movie_Right/New 00003 r . . . There are like 25000 images. But after running the repo for like half an hour, I am receiving this error and I have no clue why it happens: FileNotFoundError: [Errno 2] No such file or directory: '/home/***/DataForTraining/BlenderData/Movie_Right/New/image_03/data/-0001.png'

I created a class in the kitti_dataset.py which is almost the same as the MonoDataset with the exception that instead of 010d I used 05d and also the changed the K matrix.

Any idea?

Cheers, Jacob

mrharicot commented 4 years ago

Hi,

What is the structure of the files (folders, filenames, etc) you are using for training? Does /home/*******/DataForTraining/BlenderData/Movie_Right/New/image_03/data/-0001.png exist?

YJonmo commented 4 years ago

hi,

/home/***/DataForTraining/BlenderData/Movie_Right/New/image_03/data/00001.png exists but not the .../-0001.png

I am not sure where it gets the -0001.png from. I guess it should only use the indexed in the file that I provided?

YJonmo commented 4 years ago

The other thing that I am a bit confused about is why the images in the eighen_zhou and other files are not ordered based on the time frame? I mean why is it like this: 2011_10_03/2011_10_03_drive_0034_sync 1757 r 2011_09_26/2011_09_26_drive_0061_sync 635 r 2011_09_30/2011_09_30_drive_0020_sync 1092 l ...

rather than this

2011_10_03/2011_10_03_drive_0034_sync 1 r 2011_10_03/2011_10_03_drive_0034_sync 2 r 2011_10_03/2011_10_03_drive_0034_sync 3 r ...

The latter is how the Monodepth 1 was like.

mrharicot commented 4 years ago

Are you training in stereo mode? The default temporal mode will load the frames +1 and -1, which in the case of frame 0 will try to load -1. So you either need to remove the first and last frames from your filenames.txt or use the stereo mode only.

The frames indices were randomized when selecting 10% for validation, they are shuffled again when training so it won't matter.

YJonmo commented 4 years ago

I guess it is mono at the moment: python train.py --data_path $DATA_Folder --split BlenderRight --model_name mono_model --log_dir $OUTPUT/ --png --no_eval --dataset Endoscope --height 256 --width 256

I will remove the first and last image names from the file.

Oh I see. So let say during the training if the frame index is 2011_10_03_drive_0034_sync 2, the previous and next images will be loaded for t-1 and t+1 to estimate the disparity and the relative pose?

I have stereo camera and my images are endoscpice images where sometime the camera is not moving. Could that be a problem when training in the mono mode? Should I use the mono+stereo mode?

Honestly the image that I have have the problem of lack of texture. So I tried various stereo techniques and I got only partial success.

mrharicot commented 4 years ago

If you are using stereo inputs and the camera might not move much you should use the stereo mode only as we describe in the readme. You should also have both the left and right images as inputs to the network to increase the amount of training data.

Movie_Right/New 00001 l
Movie_Right/New 00001 r
Movie_Right/New 00002 l
Movie_Right/New 00002 r
...

Finally if you generated training data in blender, you should simply export the depth as well and train a model with a regression loss on the depth/disparity directly, this is however outside the scope of this repo.

YJonmo commented 4 years ago

Thanks for the info. Yes I have access to the depth in the simulation data but not for the actual endoscope. I explored a bit about the semi supervised method. There is one called Packnet which came out recently.

mrharicot commented 4 years ago

There is no fundamental difference between packnet and monodepth, they are both self-supervised. Packnet is a much bigger model which seems to scale better at high resolutions, but textureless regions will still be a problem for any self-supervised methods.

YJonmo commented 4 years ago

They got one called "Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances". I think that is semi supervised?

The other question which came to my mind is that should that be ok if I reject the frames where the camera is not moving and only include the frames where it is moving for the training?

mrharicot commented 4 years ago

I think that is semi supervised?

I believe so. But once again, if you have synthetic training data, why not train using direct supervision?

should that be ok if I reject the frames where the camera is not moving

Yes, this is part of the preprocess we do on the kitti data following Zhou et al. CVPR2017.

YJonmo commented 4 years ago

I created the synthetic data as a part of the pretraining approach for the unsupervised step. So for the unsupervised training of the endoscope I will load the trained network on the sysnthetic data.

Thanks a lot for the quick feedback.

mrharicot commented 4 years ago

Got it, good luck!