Closed YDayoub closed 6 months ago
Hi, indeed, your abs_rel is also very large. It seems that your network is under fitting. You might need to decrease the drop_path
rate a bit. You can try --lr 0.0001 5e-6 15 0.0001 1e-5 15
. This will reset the learning rates as the initial learning rates, and it helps the network to jump out of the local minima. Please evaluate all the epochs (usually >15 epochs) and choose the best one. Here you can find the versions of dependencies.
Thank you for your comment. I applied the suggestions you mentioned, I reduced drop_path to 0.2, and T0 to 15, yet the results are far from reported. best result: 0.106 & 0.798 & 4.562 & 0.181 & 0.890 & 0.964 & 0.983
I checked the requirement1.7.1 file, I have the same setup, except for skimage, and pillow, which won't effect the training.
I will try training with 1.12 setup, and report the results I get.
You can also try drop_path=0.3. Could you also double check that you are training lite-mono-8m, not lite-mono?
I am already trying a drop_path value of 0.3, and I'll report the results when it finishes. Regarding the model, I am training lite-mono-8m. I have double-checked and am already using the ImageNet weights you provided. If it wasn't correct, it would result in a shape mismatch when loading the weights.
Then it's really strange. Usually people can easily get a model with low abs_rel. Your abs_rel is worse than that of Lite-Mono.
I think it's something with the environment, but I cannot figure it out. I think, if I train lite-mono, I will get worse results than lite-mono-8m. I will check that too, and report what I find.
Thank you for your work,
Is it possible to share bash command you used to train lite-mono-8m model?
I am using the following command python train.py --log_dir TrainLogs --data_path $DATA --model_name $MODEL --num_epochs 30 \ --batch_size 12 --mypretrain $PRETRAIN --model lite-mono-8m --drop_path 0.4 --save_frequency 1 --lr 0.0001 5e-6 31 0.0001 1e-5 31
I increased the drop_path as you suggested in here.
However, the results are far from reprorted results:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.105 & 0.839 & 4.596 & 0.182 & 0.893 & 0.964 & 0.982 \
I am using pytorch 1.7.1 with cuda 11.0