Open kasteric opened 1 month ago
Thanks for your attention to our project! And sorry for the delay.
Sorry for the misleading results. Could you please attach the corresponding log for each experiment for analysis and locate the issue for us?
I located the issue, the data augmentations are inconsistent. For trained checkpoints, you have used resize with padding, for testing evaluation code in IMDLBenCo, you have used resize without padding, thus the data distribution is not aligned. On my custom dataset, I found that resize without padding works better than resize with padding. Did you observe similar results?
Hi,
Anyway, if you utilize the demo_test_iml_vit.sh
generated from the command benco init model_zoo
, I believe it will be resized with padding with the parameter if_padding
.
I am not sure how you train IML-ViT with resize without padding. Since the original code design only supports 1024x1024 input. Mostly we don't apply a traditional resize
but just keep the raw resolution and pad the image to 1024x1024. Could you please specify the detailed implementation here for discussion? Thank you very much.
Oh, I was not using demo_test_iml_vit.sh for testing, but just use demo_train_iml_vit.sh for evaluation, where I put the evaluation code before the training code in the train.py. In the generated demo_train_iml_vit.sh, I believe the configs are like:
base_dir="./output_dir_imlvit_orig"
mkdir -p ${base_dir}
CUDA_VISIBLE_DEVICES=1 \
torchrun \
--standalone \
--nnodes=1 \
--nproc_per_node=1 \
../train.py \
--model IML_ViT \
--edge_lambda 20 \
--vit_pretrain_path ../mae_pretrain_vit_base.pth \
--world_size 1 \
--batch_size 3 \
--data_path /<casia_v2> \
--epochs 200 \
--lr 1e-4 \
--image_size 1024 \
--if_resizing \
--min_lr 5e-7 \
--weight_decay 0.05 \
--edge_mask_width 7 \
--test_data_path /<casia_v1> \
--warmup_epochs 2 \
--output_dir ${base_dir}/ \
--log_dir ${base_dir}/ \
--accum_iter 8 \
--seed 42 \
--test_period 4 \
--resume /<resumed.pth>
where if_resizing is set to True, and the data_transform would be resize without padding, like in the code below:
self.post_transform = None
if is_padding == True:
self.post_transform = get_albu_transforms(type_ = "pad", output_size = output_size)
if is_resizing == True:
self.post_transform = get_albu_transforms(type_ = "resize", output_size = output_size)
After I manually set the ablu_transform type to "pad", the results are consistent. I made the conclusion that it was not because "pad" is better than "resize", but your checkpoints were trained based on "pad" mode.
On my custom dataset, however, I found "resize" mode yields better results by 1 or 2 percents.
Thank you for your feedback.
I see your points. Generally, the deep neural network fits a distribution as a function. Thus, keeping the training distribution similar to the testing distribution is essential. Just like the issue mentioned by you "I made the conclusion that it was not because 'pad' is better than 'resize', but your checkpoints were trained based on 'pad' mode."
Further, there are many possible explanations for the performance on your custom dataset. Such as:
Thanks again for your attention to our project. If you find the issue solved, please close the issue. You are also welcome to discuss further concerns and problems you met.
Hi, I found that for the same checkpoints of IML-ViT, the inference results on CASIA v1 inferenced through this IMDB-IML-ViT framework is much lower (~12%) that computed within of the original code base framework IML-ViT (~70%).