hisfog / SfMNeXt-Impl

[AAAI 2024] Official implementation of "SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation", and more.
MIT License
74 stars 11 forks source link

Finetune the ConvNeXt-L on KITTI #26

Open FangjunWang opened 7 months ago

FangjunWang commented 7 months ago

Hello and nice work! My question is how to finetune the model on KITTI? I tried with the script ./finetune/train_ft_SQLdepth.py but cannot get good enough results. Only abs_rel 0.0494 and rmse 2.182.

hisfog commented 7 months ago

Did you load a pre-trained model (self-supervised pre-trained), and what's your SSL scores.

FangjunWang commented 7 months ago

Did you load a pre-trained model (self-supervised pre-trained), and what's your SSL scores.

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

hisfog commented 7 months ago

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

I mean, what's your SSL model's metrics, AbsRel, RMSE, etc.

FangjunWang commented 7 months ago

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

I mean, what's your SSL model's metrics, AbsRel, RMSE, etc.


hisfog commented 7 months ago


Em, AbsRel = ?, I mean your SSL model's evaluation results on KITTI, not SiLog loss

FangjunWang commented 7 months ago


Em, AbsRel = ?, I mean your SSL model's evaluation results on KITTI, not SiLog loss

The SSL model’s ecaluation results on KITTI are: abs_rel: 0.060, rmse: 2.642.

hisfog commented 7 months ago

The SSL model’s ecaluation results on KITTI are: abs_rel: 0.060, rmse: 2.642.

That's interesting, you got better SSL scores but worse SSL+Sup scores.

hisfog commented 7 months ago

Can you provide your fine-tuning args? I think you should choose a much smaller learning_rate.

FangjunWang commented 7 months ago

Can you provide your fine-tuning args? I think you should choose a much smaller learning_rate.

--name cvnXt_075_1130 --root weights/inc_kitti_exps --load_weights_folder weights/convnext_large/cvnXt_075/models/weights_15 --epochs 5 --bs 8 --lr 1e-5 --wd 0.01 --div_factor 10 --final_div_factor 100 --validate_every 250 --dataset kitti --workers 8 --w_chamfer 0 --data_path datasets/KITTI/raw --gt_path datasets/KITTI/gts/train --filenames_file ./finetune/train_test_inputs/kitti_eigen_train_files_with_gt.txt --input_height 320 --input_width 1024 --min_depth 0.001 --max_depth 80 --do_random_rotate --degree 1.0 --data_path_eval datasets/KITTI/raw --gt_path_eval datasets/KITTI/gts/val --filenames_file_eval ./finetune/train_test_inputs/kitti_eigen_test_files_with_gt.txt --min_depth_eval 1e-3 --max_depth_eval 80 --do_kb_crop --garg_crop --same_lr

hisfog commented 7 months ago

--epochs 5 --bs 8 --lr 1e-5

I recommend --bs 16 and I think lr should be smaller, 1e-6, 5e-6, etc.

FangjunWang commented 7 months ago

--epochs 5 --bs 8 --lr 1e-5

I recommend --bs 16 and I think lr should be smaller, 1e-6, 5e-6, etc.

Thanks! I will try.

Lavreniuk commented 6 months ago

Hi @FangjunWang, I am very excited to reproduce ConvNetX results as well. However, I am currently stuck on the first stage (SSL training). I ran the training using the following command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt

in cvnXt_L_320x1024.txt I changed only data_path, log_dir and batch_size=8 (instead of 16 as original, as I understood you did same change). in other experiments I tried also lower lr, and remove diff_lr argument, but no improvement. After that I calculated the score using command: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt where I changed load_weights_folder to my weights path. However I tried weights from all epochs and best result is: abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.096 & 0.765 & 4.455 & 0.176 & 0.908 & 0.966 & 0.983 \ which is much worse than original and yours, could you please help me to understand what I did wrong on first stage, so I could fix it and after move to stage 2 (finetuning).

Should I download pretrained PoseNet or other weights, or maybe I calculates the metrics in the wrong way (but I checked it on downloaded resnet model and it reproduce same score as @hisfog claimed in gitrepo). Could you please share your parameters as well as brief instruction what to do to reproduce the score. Will be very thankful for help.

FangjunWang commented 6 months ago

Hi @FangjunWang, I am very excited to reproduce ConvNetX results as well. However, I am currently stuck on the first stage (SSL training). I ran the training using the following command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt

in cvnXt_L_320x1024.txt I changed only data_path, log_dir and batch_size=8 (instead of 16 as original, as I understood you did same change). in other experiments I tried also lower lr, and remove diff_lr argument, but no improvement. After that I calculated the score using command: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt where I changed load_weights_folder to my weights path. However I tried weights from all epochs and best result is: abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.096 & 0.765 & 4.455 & 0.176 & 0.908 & 0.966 & 0.983 \ which is much worse than original and yours, could you please help me to understand what I did wrong on first stage, so I could fix it and after move to stage 2 (finetuning).

Should I download pretrained PoseNet or other weights, or maybe I calculates the metrics in the wrong way (but I checked it on downloaded resnet model and it reproduce same score as @hisfog claimed in gitrepo). Could you please share your parameters as well as brief instruction what to do to reproduce the score. Will be very thankful for help.

Hello, my parameters are: --data_path datasets/KITTI/raw/ --log_dir weights/convnext_large --model_name cvnXt_075 --dataset kitti --eval_split eigen --backbone convnext_large --height 320 --width 1024 --batch_size 8 --num_epochs 20 --scheduler_step_size 10 --model_dim 32 --patch_size 32 --dim_out 64 --query_nums 64 --dec_channels 1024 512 256 128 --min_depth 0.001 --max_depth 80.0 --diff_lr --use_stereo --load_weights_folder weights/ConvNeXt_Large_SQLdepth --eval_mono --post_process --save_pred_disps

I did not change any other things besides above parameters. Hope this helps!

Lavreniuk commented 6 months ago

@FangjunWang, thank you for quick response. I have the same parameters. do you train using only this command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt and testing using this: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt

also did you do something else? download some pretrained weights before training or pretrained PoseNet?

FangjunWang commented 6 months ago

@FangjunWang, thank you for quick response. I have the same parameters. do you train using only this command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt and testing using this: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt

also did you do something else? download some pretrained weights before training or pretrained PoseNet?

Yes, I trained and evaluated the model use the same command. I only load a pretrained weights convnext_large_22k_1k_224.pth.

Lavreniuk commented 5 months ago

@FangjunWang, for this you should change params to: --backbone convnext_large_in22ft1k did you do it, or you manually change convnext_large to convnext_large_22k_1k_224.pth ?

FangjunWang commented 5 months ago

@FangjunWang, for this you should change params to: --backbone convnext_large_in22ft1k did you do it, or you manually change convnext_large to convnext_large_22k_1k_224.pth ?

I changed networks/Unet.py like this: if backbone == "convnext_large": pretrained = False backbone_kwargs = {"checkpoint_path": "weights/convnext_large_22k_1k_224_filtered.pth"} encoder = create_model( backbone, features_only=True, out_indices=backbone_indices, in_chans=in_channels, pretrained=pretrained, **backbone_kwargs )

Lavreniuk commented 5 months ago

@FangjunWang, thanks. could you pls write me an email to nick_93@ukr.net, so I could directly connect to you for other questions?

Lavreniuk commented 5 months ago

@FangjunWang, I have tried convnext_large_22k_1k_224 as you suggest it provides slightly better results, however situation is similar. For resnet50 I was able to mostly reproduce the original score: abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.084 & 0.646 & 3.972 & 0.163 & 0.923 & 0.969 & 0.983 \

But for convnext I found next situation it improves first 6-9 epochs, and after that not improve but get worse and worse. Have you get similar results or you have +- each epoch improvements? ep1 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.091 & 0.704 & 4.197 & 0.173 & 0.916 & 0.966 & 0.982 \ ep2 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.091 & 0.675 & 4.182 & 0.168 & 0.918 & 0.968 & 0.983 \ ep3 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.088 & 0.701 & 4.279 & 0.167 & 0.923 & 0.969 & 0.983 \ ep4 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.085 & 0.625 & 4.017 & 0.166 & 0.926 & 0.968 & 0.983 \ ep5 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.084 & 0.664 & 4.079 & 0.165 & 0.928 & 0.969 & 0.983 \ ep6 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.082 & 0.647 & 4.119 & 0.167 & 0.926 & 0.967 & 0.982 \ ep7 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.087 & 0.745 & 4.389 & 0.170 & 0.921 & 0.967 & 0.982 \ ep8 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.086 & 0.707 & 4.256 & 0.169 & 0.923 & 0.967 & 0.982 \ ep9 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.088 & 0.748 & 4.397 & 0.173 & 0.920 & 0.966 & 0.981 \

jerry-ryu commented 4 months ago

@Lavreniuk I'm in the same problem as you. I've tried the imagenet pretrained model of convnext, and the posenet provided by @hisfog (#14 ). Can you help me if you make any progress?

here is my parameters:

--data_path /mnt/RG/dataset/kitti_data --log_dir /mnt/RG/SfMNeXt-Impl/boost --model_name cvnXt_high --dataset kitti --eval_split eigen --backbone convnext_large_in22ft1k --height 320 --width 1024 --batch_size 8 --num_epochs 20 --scheduler_step_size 10 --model_dim 32 --patch_size 32 --dim_out 64 --query_nums 64 --dec_channels 1024 512 256 128 --min_depth 0.001 --max_depth 80.0 --diff_lr --use_stereo --load_weights_folder /mnt/RG/SfMNeXt-Impl/boost/cvnXt_low/models/weights_0 --eval_mono --post_process --pretrained_pose --pose_net_path /mnt/RG/SfMNeXt-Impl/checkpoints/pose

Lavreniuk commented 4 months ago

hi, @jerry-ryu , I have not reproduced the result of original repo, especially with much better result that was mentioned. I think you should train without pretrained posenet, but maybe I am wrong. But from what I found in other issues it is similar for resnet and other model, that it is impossible to reproduce it. So I switch my interest to other model.

hisfog commented 4 months ago

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

jerry-ryu commented 4 months ago

@Lavreniuk @hisfog Thank you for your kind response, I will try again and let you know.

jerry-ryu commented 4 months ago

@hisfog Thank you so much, I was finally able to reproduce SQLdepth on resnet50 1024x320.

I will post my experimental results and argsfiles for those who want to train SQLdepth.

-Depth metrics: paper: image

ResNet50 320x1024 trained: image

ConvNext 192x640 trained: image

ResNet50 320x1024


  1. Do not use latest code realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

  1. args files
    --data_path /mnt2/RG/data
    --log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
    --model_name resnet_320x1024
    --dataset kitti 
    --eval_split eigen
    --backbone resnet_lite
    --height 320 
    --width 1024
    --batch_size 10
    --num_epochs 25
    --scheduler_step_size 15
    --model_dim 32
    --patch_size 20
    --dim_out 128
    --query_nums 128
    --num_features 256
    --num_layers 50
    --min_depth 0.001
    --max_depth 80.0
    --load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/resnet_320x1024/models/weights_24

ConvNext 192x640

(Due to lack of gpu capacity, 192x640 was used instead of 320x1024) -args:

  1. Do not use latest code realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

  1. args files

    --data_path /mnt2/RG/data
    --log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
    --model_name cvnXt_192x640
    --dataset kitti 
    --eval_split eigen 
    --backbone convnext_large
    --height 192 
    --width 640
    --batch_size 8
    --num_epochs 25
    --scheduler_step_size 15
    --model_dim 32
    --patch_size 16
    --dim_out 64
    --query_nums 64
    --dec_channels 1024 512 256 128
    --min_depth 0.001
    --max_depth 80.0
    --load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/cvnXt_192x640/models/weights_24

    Thank you again for your wonderful code and congratulations paper accept!

p.s. I don't think there's any special change between the commit you told me and the latest code, so if you have any ideas about what made the experimental results significantly different, I'd appreciate it if you could tell me.

NoelShin commented 3 months ago

Thank you @hisfog and @jerry-ryu for the kind responses and sharing the experiment settings.

Background: I was in the same situation where I couldn't get the similar results to the numbers reported in the paper when using the latest code. Now knowing this issue, I'm training with the suggested branch, but curious what caused the difference in my result.

I checked the differences between the latest commit and 6a1e997f97caef8de080bb2873f71cfbad9a8047, and the most notable difference I can find was the filename changes in splits/eigen_zhou/train_files.txt which can possibly affect the training. @hisfog, do you think this is the cause?

jerry-ryu commented 3 months ago

@NoelShin I looked it up after seeing your reply, and it seems quite reasonable. Thank you for finding it!!

XIAN-XIAN-X commented 3 months ago

Thank you @hisfog and @jerry-ryu for the kind responses and sharing the experiment settings.

Background: I was in the same situation where I couldn't get the similar results to the numbers reported in the paper when using the latest code. Now knowing this issue, I'm training with the suggested branch, but curious what caused the difference in my result.

I checked the differences between the latest commit and 6a1e997, and the most notable difference I can find was the filename changes in splits/eigen_zhou/train_files.txt which can possibly affect the training. @hisfog, do you think this is the cause?

hello!I notice that too.Do you know which paper the old split came from?

chaoying0115 commented 2 months ago

非常感谢,我终于能够在 resnet50 1024x320 上重现 SQLdepth。

我将为那些想要训练 SQLdepth 的人发布我的实验结果和 argsfile。

-深度指标:纸: image

ResNet50 320x1024 训练: image

ConvNext 192x640 训练: image

ResNet50 320×1024


  1. 不要使用最新的代码 realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047


git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name resnet_320x1024
--dataset kitti 
--eval_split eigen
--backbone resnet_lite
--height 320 
--width 1024
--batch_size 10
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 20
--dim_out 128
--query_nums 128
--num_features 256
--num_layers 50
--min_depth 0.001
--max_depth 80.0
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/resnet_320x1024/models/weights_24

转换下一个 192x640

(由于 GPU 容量不足,使用 192x640 而不是 320x1024) -参数:

  1. 不要使用最新的代码 realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047


git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name cvnXt_192x640
--dataset kitti 
--eval_split eigen 
--backbone convnext_large
--height 192 
--width 640
--batch_size 8
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/cvnXt_192x640/models/weights_24


p.s. 我不认为你告诉我的提交和最新代码之间有任何特别的变化,所以如果你对是什么让实验结果显着不同有任何想法,如果你能告诉我,我将不胜感激。

您好我尝试复现了resnet50 640x192 ,但是得到的效果相差很多 image


--data_path /home/ccy/project/kitti_data/ --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 6 --num_epochs 25 --model_dim 64 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --load_weights_folder /home/Process3/tmp/mdp/res50_models/weights_19 --post_process --min_depth 0.001 --max_depth 80.0 --ext jpg --model_name mdp2 --log_dir /home/ccy/tmp/

这是args_res50_kitti_192x640_eval.txt --data_path /home/ccy/project/kitti_data/ --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 6 --model_dim 64 --patch_size 16 --query_nums 120 --eval_mono --load_weights_folder /home/ccy/tmp/mdp2/models/weights_8/ --post_process --min_depth 0.01 --max_depth 80.0 --save_pred_disps

我使用的数据集就是monodepth2对应处理的kitti_data image

探索很久不知道具体原因 非常期待您的回复和指导,谢谢

lmz-sense commented 6 days ago

非常感谢,我终于能够在 resnet50 1024x320 上重现 SQLdepth。 我将为那些想要训练 SQLdepth 的人发布我的实验结果和 argsfile。 -深度指标:纸: image ResNet50 320x1024 训练: image ConvNext 192x640 训练: image

ResNet50 320×1024


  1. 不要使用最新的代码 realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047


git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name resnet_320x1024
--dataset kitti 
--eval_split eigen
--backbone resnet_lite
--height 320 
--width 1024
--batch_size 10
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 20
--dim_out 128
--query_nums 128
--num_features 256
--num_layers 50
--min_depth 0.001
--max_depth 80.0
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/resnet_320x1024/models/weights_24

转换下一个 192x640

(由于 GPU 容量不足,使用 192x640 而不是 320x1024) -参数:

  1. 不要使用最新的代码 realease git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047


git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name cvnXt_192x640
--dataset kitti 
--eval_split eigen 
--backbone convnext_large
--height 192 
--width 640
--batch_size 8
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/cvnXt_192x640/models/weights_24

再次感谢您的精彩代码,并祝贺论文接受! p.s. 我不认为你告诉我的提交和最新代码之间有任何特别的变化,所以如果你对是什么让实验结果显着不同有任何想法,如果你能告诉我,我将不胜感激。

您好我尝试复现了resnet50 640x192 ,但是得到的效果相差很多 image


--data_path /home/ccy/project/kitti_data/ --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 6 --num_epochs 25 --model_dim 64 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --load_weights_folder /home/Process3/tmp/mdp/res50_models/weights_19 --post_process --min_depth 0.001 --max_depth 80.0 --ext jpg --model_name mdp2 --log_dir /home/ccy/tmp/

这是args_res50_kitti_192x640_eval.txt --data_path /home/ccy/project/kitti_data/ --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 6 --model_dim 64 --patch_size 16 --query_nums 120 --eval_mono --load_weights_folder /home/ccy/tmp/mdp2/models/weights_8/ --post_process --min_depth 0.01 --max_depth 80.0 --save_pred_disps

我使用的数据集就是monodepth2对应处理的kitti_data image

探索很久不知道具体原因 非常期待您的回复和指导,谢谢
