Open melohux opened 3 years ago
"The learning rate is set to 0.1 with batch size 256 and decays to 1e-5 following the cosine schedule. " This line in paper means that I set 0.1 for batch size 256, i.e.
lr = 0.1 * (batch_size / 256),
that is 0.4 for batch size 1024 (4*256).
Sorry for the unclear words.
Since I am working on the next paper based on DDF, I have updated this repo several times. I will check the validation code. You can also verify it by using the released model parameters.
So the experimental results in your paper are obtained by training with batch size 256 or 1024? And if my training log matches yours in terms of the loss value? In addition, if you can check the validation code it would be great, thanks.
So the experimental results in your paper are obtained by training with batch size 256 or 1024? And if my training log matches yours in terms of the loss value? In addition, if you can check the validation code it would be great, thanks.
I use 1024 for R50, 512 for R101.
So the experimental results in your paper are obtained by training with batch size 256 or 1024? And if my training log matches yours in terms of the loss value? In addition, if you can check the validation code it would be great, thanks.
I use 1024 for R50, 512 for R101.
And about the R101, what' your data aumentation schedule? I found random erasing, aotu-augment and color jitter in your train code.
Thanks for your excellent work and I met some problems when I train your model following your instruction.
You claimed in your paper that you were using batch size 256 for all experimental results but in your instruction
./distributed_train.sh 8 <path_to_imagenet> --model ddf_mul_resnet50 --lr 0.4 \ --warmup-epochs 5 --epochs 120 --sched cosine -b 128 -j 6 --amp --dist-bn reduce
it seems that this command will launch a training with batch size 128*8.When I follow your command
./distributed_train.sh 8 <path_to_imagenet> --model ddf_mul_resnet50 --lr 0.4 \ --warmup-epochs 5 --epochs 120 --sched cosine -b 128 -j 6 --amp --dist-bn reduce
, the training process seems to be correct but the validation process has some problems:Does the training log match your training process? Do you have any idea for the problem of the testing part?