I use 4 P40 GPUs, so I change the configs about GPUs like these:
GPUS=0,1,2,3
NUM_GPUS=4
NUM_WORKERS=4
remove --fp16
batch_size: 128
lr_scale_factor: 128
the logs is bellow:
PyTorch VERSION: 1.1.0
CUDA VERSION: 9.0.176
CUDNN VERSION: 7501
GPU TYPE: Tesla P40
Warning: if --fp16 is not used, static_loss_scale will be ignored.
Warning: if --fp16 is not used, static_loss_scale will be ignored.
Warning: if --fp16 is not used, static_loss_scale will be ignored.
Warning: if --fp16 is not used, static_loss_scale will be ignored.
=> creating aognet
=> Params (double-check): 12.373355M
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
=> ! Weight decay applied to FeatNorm parameters
Epoch: [0][0/2503] Time 7.958 (7.958) Speed 64.334 (64.334) Data 0.623 (0.623) Loss 6.9367394447 (6.9367) Prec@1 0.000 (0.000) Prec@5 0.391 (0.391) lr 0.000032
Epoch: [0][10/2503] Time 1.237 (1.862) Speed 414.005 (274.955) Data 0.001 (0.057) Loss 6.9408278465 (6.9327) Prec@1 0.000 (0.053) Prec@5 0.586 (0.550) lr 0.000352
Epoch: [0][20/2503] Time 1.235 (1.565) Speed 414.680 (327.186) Data 0.000 (0.030) Loss 6.9295492172 (6.9330) Prec@1 0.000 (0.074) Prec@5 0.391 (0.502) lr 0.000671
Epoch: [0][30/2503] Time 1.235 (1.461) Speed 414.436 (350.407) Data 0.001 (0.021) Loss 6.9239211082 (6.9295) Prec@1 0.195 (0.082) Prec@5 0.391 (0.498) lr 0.000991
Epoch: [0][40/2503] Time 1.233 (1.408) Speed 415.177 (363.574) Data 0.000 (0.016) Loss 6.9220929146 (6.9269) Prec@1 0.000 (0.091) Prec@5 0.586 (0.557) lr 0.001310
Epoch: [0][50/2503] Time 1.235 (1.376) Speed 414.457 (371.963) Data 0.000 (0.013) Loss 6.9215707779 (6.9268) Prec@1 0.000 (0.084) Prec@5 0.195 (0.532) lr 0.001630
Epoch: [0][60/2503] Time 1.261 (1.355) Speed 406.069 (377.859) Data 0.001 (0.011) Loss 6.9190740585 (6.9267) Prec@1 0.195 (0.090) Prec@5 0.586 (0.525) lr 0.001950
Epoch: [0][70/2503] Time 1.234 (1.340) Speed 414.789 (382.103) Data 0.000 (0.009) Loss 6.9427552223 (6.9263) Prec@1 0.000 (0.091) Prec@5 0.195 (0.506) lr 0.002269
Epoch: [0][80/2503] Time 1.264 (1.328) Speed 404.973 (385.508) Data 0.001 (0.008) Loss 6.9220204353 (6.9271) Prec@1 0.000 (0.084) Prec@5 0.195 (0.482) lr 0.002589
Epoch: [0][90/2503] Time 1.264 (1.319) Speed 404.948 (388.145) Data 0.001 (0.007) Loss 6.9311232567 (6.9266) Prec@1 0.195 (0.084) Prec@5 0.195 (0.489) lr 0.002909
Epoch: [0][100/2503] Time 1.262 (1.311) Speed 405.781 (390.479) Data 0.001 (0.007) Loss 6.9314498901 (6.9261) Prec@1 0.000 (0.091) Prec@5 0.391 (0.493) lr 0.003228
Epoch: [0][110/2503] Time 1.263 (1.305) Speed 405.252 (392.227) Data 0.000 (0.006) Loss 6.9336290359 (6.9266) Prec@1 0.195 (0.090) Prec@5 0.391 (0.489) lr 0.003548
I also run the BN version, it's also not convergent after 20 epoches.
I use 4 P40 GPUs, so I change the configs about GPUs like these: GPUS=0,1,2,3 NUM_GPUS=4 NUM_WORKERS=4 remove --fp16
batch_size: 128 lr_scale_factor: 128
the logs is bellow: PyTorch VERSION: 1.1.0 CUDA VERSION: 9.0.176 CUDNN VERSION: 7501 GPU TYPE: Tesla P40 Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. => creating aognet => Params (double-check): 12.373355M Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. => ! Weight decay applied to FeatNorm parameters Epoch: [0][0/2503] Time 7.958 (7.958) Speed 64.334 (64.334) Data 0.623 (0.623) Loss 6.9367394447 (6.9367) Prec@1 0.000 (0.000) Prec@5 0.391 (0.391) lr 0.000032 Epoch: [0][10/2503] Time 1.237 (1.862) Speed 414.005 (274.955) Data 0.001 (0.057) Loss 6.9408278465 (6.9327) Prec@1 0.000 (0.053) Prec@5 0.586 (0.550) lr 0.000352 Epoch: [0][20/2503] Time 1.235 (1.565) Speed 414.680 (327.186) Data 0.000 (0.030) Loss 6.9295492172 (6.9330) Prec@1 0.000 (0.074) Prec@5 0.391 (0.502) lr 0.000671 Epoch: [0][30/2503] Time 1.235 (1.461) Speed 414.436 (350.407) Data 0.001 (0.021) Loss 6.9239211082 (6.9295) Prec@1 0.195 (0.082) Prec@5 0.391 (0.498) lr 0.000991 Epoch: [0][40/2503] Time 1.233 (1.408) Speed 415.177 (363.574) Data 0.000 (0.016) Loss 6.9220929146 (6.9269) Prec@1 0.000 (0.091) Prec@5 0.586 (0.557) lr 0.001310 Epoch: [0][50/2503] Time 1.235 (1.376) Speed 414.457 (371.963) Data 0.000 (0.013) Loss 6.9215707779 (6.9268) Prec@1 0.000 (0.084) Prec@5 0.195 (0.532) lr 0.001630 Epoch: [0][60/2503] Time 1.261 (1.355) Speed 406.069 (377.859) Data 0.001 (0.011) Loss 6.9190740585 (6.9267) Prec@1 0.195 (0.090) Prec@5 0.586 (0.525) lr 0.001950 Epoch: [0][70/2503] Time 1.234 (1.340) Speed 414.789 (382.103) Data 0.000 (0.009) Loss 6.9427552223 (6.9263) Prec@1 0.000 (0.091) Prec@5 0.195 (0.506) lr 0.002269 Epoch: [0][80/2503] Time 1.264 (1.328) Speed 404.973 (385.508) Data 0.001 (0.008) Loss 6.9220204353 (6.9271) Prec@1 0.000 (0.084) Prec@5 0.195 (0.482) lr 0.002589 Epoch: [0][90/2503] Time 1.264 (1.319) Speed 404.948 (388.145) Data 0.001 (0.007) Loss 6.9311232567 (6.9266) Prec@1 0.195 (0.084) Prec@5 0.195 (0.489) lr 0.002909 Epoch: [0][100/2503] Time 1.262 (1.311) Speed 405.781 (390.479) Data 0.001 (0.007) Loss 6.9314498901 (6.9261) Prec@1 0.000 (0.091) Prec@5 0.391 (0.493) lr 0.003228 Epoch: [0][110/2503] Time 1.263 (1.305) Speed 405.252 (392.227) Data 0.000 (0.006) Loss 6.9336290359 (6.9266) Prec@1 0.195 (0.090) Prec@5 0.391 (0.489) lr 0.003548
I also run the BN version, it's also not convergent after 20 epoches.