Open zsz-pro opened 1 year ago
Is this an error map? or a depth map?
I run the test_simple_SQL_config.py without any revise. It is supposed to be a depth map?
I think you did not load the pre-trained weights, you should set --load_pretrained_model in you args_file. Sorry for that, i changed the test_simple_SQL_config.py but did not make modifications in args_files.
Yeah,thanks for reminding me! And the modifications for the 'kitti-resnet50-640*192' weight with respect to key arguments in args_files(For the convenience of others who might need it): --backbone resnet --num_features 256 --dim_out 64 --batch_size 16 --model_dim 32 --patch_size 16 --query_nums 64
I have one more question to ask: What potential issues do you think might arise when using this depth estimation result for novel view synthesis? It seems that this adaptive binning approach is very friendly for NVS(novel view synthesis).
I trained for 20 epochs, and the evaluation results are quite different from the evaluation results using the weights you provided(kitti-resnet50-640*192).Could you give me some advice?
Can you provide you training args (in you args_file)?
--data_path /mnt/kitti_data/raw/
--dataset kitti
--eval_split eigen
--height 192
--width 640
--batch_size 16
--model_dim 32
--patch_size 16
--query_nums 64
--eval_mono
--post_process
--min_depth 0.01
--max_depth 80.0
--save_pred_disps
--backbone resnet
--num_features 256
--dim_out 64
encoder_layers = nn.modules.transformer.TransformerEncoderLayer(embedding_dim, num_heads, dim_feedforward=512)
I change the dim_feedforward from 1024 to 512 in Depth_Decoder_QueryTr, because the weights you provided (kitti-resnet50-640*192) are also compatible with dim_feedforward=512
.
I think the latest code seems to be for indoor scenes (may be?). But I can reproduce the author's results on KITTI using an older version of the code.
My git hash is 6a1e997f97caef8de080bb2873f71cfbad9a8047
, you can switch to this version by
git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047
Hope it can help you
I think the latest code seems to be for indoor scenes (may be?). But I can reproduce the author's results on KITTI using an older version of the code. My git hash is
6a1e997f97caef8de080bb2873f71cfbad9a8047
, you can switch to this version bygit checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047
Hope it can help you
I can not reproduce the author's results like you, can you provide the details and args in your training? for example, did you just use the 'args_files\args_res50_kitti_192x640_train.txt' provided by author? or you make some changes refer to 'args_files\hisfog\kitti\resnet_192x640.txt' ? and did you add '--backbone resnet' as the paper ,or you just use the default Unet? I think these may be the reason I can not get the right result as you. thank you!
My args file is args_files\hisfog\kitti\resnet_320x1024.txt
, and backbone is --backbone resnet_lite
. Since the args_files\args_res50_kitti_192x640_train.txt
does not set --use stereo, which i think is for monocular training only.
My args file is
args_files\hisfog\kitti\resnet_320x1024.txt
, and backbone is--backbone resnet_lite
. Since theargs_files\args_res50_kitti_192x640_train.txt
does not set --use stereo, which i think is for monocular training only.
thank you very much! so you use the args_file for testing to train without changing (such as --use stereo) ,right? I will have a try
My args file is
args_files\hisfog\kitti\resnet_320x1024.txt
, and backbone is--backbone resnet_lite
. Since theargs_files\args_res50_kitti_192x640_train.txt
does not set --use stereo, which i think is for monocular training only.
I ‘ve tried as your suggestion, but it doesn't seem to be working. Here is my result after 20 epoch,
and my args_file is I wonder if it's different from yours in any way? @seoAlexer @hisfog @zsz-pro
I use code with git hash 6a1e997f97caef8de080bb2873f71cfbad9a8047
. I do not set --diff_lr, and --min_depth is set to 0.001.
I have one more question to ask: What potential issues do you think might arise when using this depth estimation result for novel view synthesis? It seems that this adaptive binning approach is very friendly for NVS(novel view synthesis).
Using a depth map and differentiable warp for NVS may fail to synthesize occluded areas. But I'm not entirely sure why you're utilizing a depth map for NVS.
I cannot reproduce the results, either. I tried the code with git hash 6a1e997f97caef8de080bb2873f71cfbad9a8047
using the same configuration, and my absrel is 0.108. I wonder if it has something to do with the environment. If it is, could you please more details about settinng up the environment?
My args file is
args_files\hisfog\kitti\resnet_320x1024.txt
, and backbone is--backbone resnet_lite
. Since theargs_files\args_res50_kitti_192x640_train.txt
does not set --use stereo, which i think is for monocular training only.I ‘ve tried as your suggestion, but it doesn't seem to be working. Here is my result after 20 epoch, and my args_file is I wonder if it's different from yours in any way? @seoAlexer @hisfog @zsz-pro
@indu1ge 1) Do not use --diff_lr unless you have loaded a well pre-trained pose_net. 2) Based on my experience, a min_depth of 0.001 might be better. 3) Additionally, the best results may not necessarily occur at 20 epochs; they could appear earlier, such as at 15 epochs.
I tried training with ResNet18 as the backbone for 20 epochs, with the following settings: --data_path ./data/kitti_raw --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 16 --num_epochs 25 --model_dim 32 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --post_process --min_depth 0.001 --max_depth 80.0 --backbone resnet18_lite But didn't get good result. This is the best result I have got. I don't know what went wrong.
@Shaw-Way The first SSL training, especially for monocular training only, may not yield optimal results, and this is normal,as PoseNet might not have converged yet. You can refer to the experimental setups of other successful replications, e.g. https://github.com/hisfog/SfMNeXt-Impl/issues/13#issuecomment-1754337890, https://github.com/hisfog/SfMNeXt-Impl/issues/26#issuecomment-1840013244
@Shaw-Way The first SSL training, especially for monocular training only, may not yield optimal results, and this is normal,as PoseNet might not have converged yet. You can refer to the experimental setups of other successful replications, e.g. #13 (comment), #26 (comment)
This result shows a significant gap from the metrics in your paper. Did you achieve those metrics directly through SSL training, or were there additional fine-tuning steps?
This result shows a significant gap from the metrics in your paper. Did you achieve those metrics directly through SSL training, or were there additional fine-tuning steps?
I only did supervised fine-tuning for the ConvNeXt-L model. Other results are produced by SSL training only.
This result shows a significant gap from the metrics in your paper. Did you achieve those metrics directly through SSL training, or were there additional fine-tuning steps?
I only did supervised fine-tuning for the ConvNeXt-L model. Other results are produced by SSL training only.
Thanks for your reply. Do you think training for more epochs would be helpful, or is there any problems with my settings.
Thanks for your reply. Do you think training for more epochs would be helpful, or is there any problems with my settings.
More epochs might not be helpful, for settings, you can refer to https://github.com/hisfog/SfMNeXt-Impl/issues/13#issuecomment-1808457711, https://github.com/hisfog/SfMNeXt-Impl/issues/26#issuecomment-1840013244, and https://github.com/hisfog/SfMNeXt-Impl/issues/13#issuecomment-1752824019. Hope that can help you.
Hello, I would like to reproduce the SSL results of kitti-resnet50-1024 * 320, based on args_files\hislog\kitti\resnet_320x1024. txt, I did not set -- diff_lr,other parameters remain unchanged.I did not use any pre-training weights,the training result of abs rel is larger than 0.1, and other parameters are not optimal. As you mentioned above, it is normal for the first self supervised training to not achieve the best result. I would like to know how to train after the first self supervised training to achieve abs rel=0.082. I would like to know the training details after the first self supervised training.After the first round of training is completed, is there a need for a second or multiple rounds of training, does the second training process need to use the weights of one of the rounds of the first training as the pre-training weights, is it necessary to load both pose.pth, encoder.pth, and depth.pth or is it necessary to only use pose.pth and set -- diff_lr.
Hello, I would like to reproduce the SSL results of kitti-resnet50-1024 * 320, based on args_files\hislog\kitti\resnet_320x1024. txt, I did not set -- diff_lr,other parameters remain unchanged.I did not use any pre-training weights,the training result of abs rel is larger than 0.1, and other parameters are not optimal. As you mentioned above, it is normal for the first self supervised training to not achieve the best result. I would like to know how to train after the first self supervised training to achieve abs rel=0.082. I would like to know the training details after the first self supervised training.After the first round of training is completed, is there a need for a second or multiple rounds of training, does the second training process need to use the weights of one of the rounds of the first training as the pre-training weights, is it necessary to load both pose.pth, encoder.pth, and depth.pth or is it necessary to only use pose.pth and set -- diff_lr.
hello!I have the same confusion as you, have you solved it
I tried training with ResNet18 as the backbone for 20 epochs, with the following settings: --data_path ./data/kitti_raw --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 16 --num_epochs 25 --model_dim 32 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --post_process --min_depth 0.001 --max_depth 80.0 --backbone resnet18_lite But didn't get good result. This is the best result I have got. I don't know what went wrong.
hello!I have the same confusion as you, have you solved it @Shaw-Way
I cannot reproduce the results, either. I tried the code with git hash
6a1e997f97caef8de080bb2873f71cfbad9a8047
using the same configuration, and my absrel is 0.108. I wonder if it has something to do with the environment. If it is, could you please more details about settinng up the environment?My args file is
args_files\hisfog\kitti\resnet_320x1024.txt
, and backbone is--backbone resnet_lite
. Since theargs_files\args_res50_kitti_192x640_train.txt
does not set --use stereo, which i think is for monocular training only.I ‘ve tried as your suggestion, but it doesn't seem to be working. Here is my result after 20 epoch, and my args_file is I wonder if it's different from yours in any way? @seoAlexer @hisfog @zsz-pro
Hello, I met the same question with you. And I want to ask if you use JEPG images to train the model, or '.png'. I look though the code, the author may use a png dataset to train and val the model.
我的 args 文件是 ,backbone 是 。由于没有设置 --use stereo,我认为它仅用于单目训练。
args_files\hisfog\kitti\resnet_320x1024.txt``--backbone resnet_lite``args_files\args_res50_kitti_192x640_train.txt
Hello, I have a question. When I don't set pre training posenet and don't set -- use strereo, the results are very poor. What is the reason for this? abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.444 & 4.749 & 12.046 & 0.586 & 0.303 & 0.560 & 0.767 \
--data_path /home/Clandy/data --log_dir /home/Clandy/train_models/tree --model_name res_099 --dataset kitti --eval_split eigen --backbone resnet --height 192 --width 640 --batch_size 16 --num_epochs 25 --scheduler_step_size 15 --num_layers 50 --num_features 256 --model_dim 32 --patch_size 16 --dim_out 64 --query_nums 64 --min_depth 0.001 --max_depth 80.0 --eval_mono --load_weights_folder /home/Clandy/train_models/tree/res_099/models/weights_24 --post_process
Nice work! I would appreciate your guidance on the following two questions: 1.This is the result of my testing with the latest code using the 'kitti-resnet50-640*192' weights. How do you perceive the errors introduced by shadows? 2.What potential issues do you think might arise when using this depth estimation result for novel view synthesis? It seems that this adaptive binning approach is very friendly for NVS(novel view synthesis).