mrharicot / monodepth

Unsupervised single image depth prediction with CNNs
Other
2.22k stars 631 forks source link

Cityscape again #149

Closed ehabhelmy82 closed 6 years ago

ehabhelmy82 commented 6 years ago

Dear Author Thanks for you work and your answers Concerning training on cityscape and testing on KITTI results that mentioned in table 1. I trained the model three times and every time I get a very far result from that mentioned in the paper.I do the following: First I extract Cityscape files then put all training images (left and right) in one folder called CS_Train but the image 'troisdorf_000000_000073_rightImg8bit' was black and white so I replace it with 'troisdorf_000000_000073_leftImg8bit' Then use the following excution commands: python3 monodepth_main.py --mode train --model_name CSmodel --data_path /data/ehab/CS_Train/ --filenames_file ./utils/filenames/cityscapes_train_files.txt --log_directory ~/CSd/

python3 monodepth_main.py --mode test --data_path /data/ehab/data_scene_flow/ --filenames_file ./utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/CSd/ --checkpoint_path ~/CSd/CSmodel/model-143600

python utils/evaluate_kitti.py --split kitti --predicted_disp_path ../CSd/CSmodel/disparities.npy --gt_path ../data/data_scene_flow

I got the following results : abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.2726, 3.9903, 9.041, 0.317, 62.491, 0.661, 0.877, 0.950

When repeated again all three commands I got the following results abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.2750, 3.9985, 9.033, 0.315, 62.641, 0.654, 0.876, 0.951

In the third run I got the following : abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.2587, 3.5841, 8.759, 0.306, 59.505, 0.673, 0.882, 0.953

I can not find out what is wrong? can you help me? By the way, when I run using the pretrained model I got results very close to that in the paper: abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.7058, 10.4943, 14.665, 0.546, 94.768, 0.054, 0.325, 0.856 I really appreciate your reply

mrharicot commented 6 years ago

Hi, There is a --dataset argument which you need to set to cityscapes which crops the car hood. Can I ask why you are focusing on the cityscapes results?

ehabhelmy82 commented 6 years ago

Thank you very much for your reply Am not focusing on certain result rather than I want to repeat all reported results exactly so that I can compare with my proposed approach, I almost finished all except for Cityscape and Stereo experiment still not the same results as yours. After your comment I wish it will work well for Cityscape and in parallel am trying to fix Stereo model. Once again thank you very much

ehabhelmy82 commented 6 years ago

I used the following: python3 monodepth_main.py --mode train --model_name CSmodel3 --data_path /data/ehab/CS_Train/ --filenames_file ./utils/filenames/cityscapes_train_files.txt --log_directory ~/CSd3/ --dataset cityscapes

python3 monodepth_main.py --mode test --data_path /data/ehab/data_scene_flow/ --filenames_file ./utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/CSd3/ --checkpoint_path ~/CSd3/CSmodel3/model-143600 --dataset cityscapes

python utils/evaluate_kitti.py --split kitti --predicted_disp_path ../CSd3/CSmodel3/disparities.npy --gt_path ../data/data_scene_flow

I got the following: /home/ehab/monodepth-master/utils/evaluation_utils.py:62: RuntimeWarning: divide by zero encountered in divide pred_depth = width_to_focal[width] 0.54 / pred_disp /home/ehab/monodepth-master/utils/evaluation_utils.py:62: RuntimeWarning: overflow encountered in divide pred_depth = width_to_focal[width] 0.54 / pred_disp abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 6.0278, 418.4342, 64.898, 1.888, 100.000, 0.009, 0.027, 0.057 What is wrong??

mrharicot commented 6 years ago

Hi, You only need the dataset flag when you train, not when you test because you are testing on kitti.

ehabhelmy82 commented 6 years ago

python3 monodepth_main.py --mode test --data_path /data/ehab/data_scene_flow/ --filenames_file ./utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/CSd3/ --checkpoint_path ~/CSd3/CSmodel3/model-143600

python utils/evaluate_kitti.py --split kitti --predicted_disp_path ../CSd3/CSmodel3/disparities.npy --gt_path ../data/data_scene_flow

/home/ehab/monodepth-master/utils/evaluation_utils.py:62: RuntimeWarning: divide by zero encountered in divide pred_depth = width_to_focal[width] 0.54 / pred_disp /home/ehab/monodepth-master/utils/evaluation_utils.py:62: RuntimeWarning: overflow encountered in divide pred_depth = width_to_focal[width] 0.54 / pred_disp abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 6.0278, 418.4342, 64.898, 1.888, 100.000, 0.009, 0.027, 0.057 Still wrong??? Any idea? Thanks

mrharicot commented 6 years ago

Hi, The test results are identical to your previous evaluation when you used the dataset cityscapes flag, are you sure it overwrote the previous results?

ehabhelmy82 commented 6 years ago

yes

jahaniam commented 6 years ago

I remember once I was getting something similar to you. Whatever I did the result was far away and didn't change at all. The warning divided by zero is important! I think what I was doing was providing it with wrong files that didn't exist!( a typo in my paths) make sure you are giving it a correct path!

mrharicot commented 6 years ago

@ehabhelmy82 Can you share a couple of the generated disparity maps images here?

abhiagwl4262 commented 6 years ago

How do I evaluate performance on cityscape test data? Cityscape disparity ranges from 0-126 while monodepth output range is different.

mrharicot commented 6 years ago

Cityscapes doesn't have good quality depth data, they ran an off the shelf stereo algorithm. As far as I know, no single frame depth estimation paper ran numbers on this dataset.

abhiagwl4262 commented 6 years ago

I tried to train on cityscape data changing the resolution (512, 1024) without cropping car hood. I used pretrained city2kitti_resnet model that is trained on (256, 512). results are absolutely rubbish. Training and validation loss were both decreasing. any insights?

Actually I want to find disparity for the images with resolution (480,640). any suggestions?

mrharicot commented 6 years ago

Hi, Can you give more details as to how the results were bad? How did the depthmaps look like? Additionnaly, it doesn't make much sense to reuse the city2kitti model, I would use the cityscapes model instead.

abhiagwl4262 commented 6 years ago

I will share few disparity maps with you. I am running few more experiments and will talk to you more about this after training is done.

On Fri, Jun 1, 2018, 5:51 PM Clément Godard notifications@github.com wrote:

Hi, Can you give more details as to how the results were bad? How did the depthmaps look like? Additionnaly, it doesn't make much sense to reuse the city2kitti model, I would use the cityscapes model instead.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrharicot/monodepth/issues/149#issuecomment-393863927, or mute the thread https://github.com/notifications/unsubscribe-auth/AVmJWFNZEqrE-s3LnN1MYeL8s_Y729a-ks5t4THJgaJpZM4TytZY .

abhiagwl4262 commented 6 years ago

This is when I trained on cityscape data with 1024*512 resolution from scratch(withoput using pretrained model). I mentioned --dataset flag to cityscapes for this. batchsize was 1. I chose resnet50 as the encoder. I am not sure why I am not able to reproduce the results.

hanover_000000_008017_leftimg8bit_disp hanover_000000_014713_leftimg8bit_disp hanover_000000_019282_leftimg8bit_disp

mrharicot commented 6 years ago

Hi, The network seems to be unable to learn. 1024x512 is really large, why are you trying such high resolution? I would first try to lower the learning rate to see if you can achieve stable training to begin with. If you want to use the pretrained model, I would try to use progressively higher resolutions, not directly 1024x512.