Hyperparamters for training lite-mono-8m model

noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

MIT License

527 stars 58 forks source link

Hyperparamters for training lite-mono-8m model #124

Closed YDayoub closed 6 months ago

YDayoub commented 6 months ago

Thank you for your work,

Is it possible to share bash command you used to train lite-mono-8m model?

I am using the following command python train.py --log_dir TrainLogs --data_path $DATA --model_name $MODEL --num_epochs 30 \ --batch_size 12 --mypretrain $PRETRAIN --model lite-mono-8m --drop_path 0.4 --save_frequency 1 --lr 0.0001 5e-6 31 0.0001 1e-5 31

I increased the drop_path as you suggested in here.

However, the results are far from reprorted results:

abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.105 & 0.839 & 4.596 & 0.182 & 0.893 & 0.964 & 0.982 \

I am using pytorch 1.7.1 with cuda 11.0

noahzn commented 6 months ago

Hi, indeed, your abs_rel is also very large. It seems that your network is under fitting. You might need to decrease the drop_path rate a bit. You can try --lr 0.0001 5e-6 15 0.0001 1e-5 15. This will reset the learning rates as the initial learning rates, and it helps the network to jump out of the local minima. Please evaluate all the epochs (usually >15 epochs) and choose the best one. Here you can find the versions of dependencies.

YDayoub commented 6 months ago

Thank you for your comment. I applied the suggestions you mentioned, I reduced drop_path to 0.2, and T0 to 15, yet the results are far from reported. best result: 0.106 & 0.798 & 4.562 & 0.181 & 0.890 & 0.964 & 0.983

I checked the requirement1.7.1 file, I have the same setup, except for skimage, and pillow, which won't effect the training.

I will try training with 1.12 setup, and report the results I get.

noahzn commented 6 months ago

You can also try drop_path=0.3. Could you also double check that you are training lite-mono-8m, not lite-mono?

YDayoub commented 6 months ago

I am already trying a drop_path value of 0.3, and I'll report the results when it finishes. Regarding the model, I am training lite-mono-8m. I have double-checked and am already using the ImageNet weights you provided. If it wasn't correct, it would result in a shape mismatch when loading the weights.

noahzn commented 6 months ago

Then it's really strange. Usually people can easily get a model with low abs_rel. Your abs_rel is worse than that of Lite-Mono.

YDayoub commented 6 months ago

I think it's something with the environment, but I cannot figure it out. I think, if I train lite-mono, I will get worse results than lite-mono-8m. I will check that too, and report what I find.