Hello, when I used your network structure to train the ShanghaiTech PartB data set, I iterated very slowly. There were 390 pieces of the as the training set and 10 pieces as the val set. Because your code use every train 4 times so it excually 780 pic.After a day of training, I iterated 31 rounds.The parameters have not been modified, but batchsize has been increased to 2. I don't know how long you train this network. I maybe in trouble
epoch 30, processed 46800 samples, lr 0.0000001000
Epoch: [30][0/780] Time 1.599 (1.599) Data 0.081 (0.081) Loss 15.9025 (15.9025)
Epoch: [30][30/780] Time 3.322 (3.261) Data 0.053 (0.060) Loss 10.3198 (44.3440)
Epoch: [30][60/780] Time 3.317 (3.288) Data 0.054 (0.060) Loss 41.0672 (43.5530)
Epoch: [30][90/780] Time 3.323 (3.298) Data 0.057 (0.060) Loss 142.7686 (42.1636)
Epoch: [30][120/780] Time 3.319 (3.302) Data 0.054 (0.060) Loss 14.8476 (52.9237)
Epoch: [30][150/780] Time 3.310 (3.305) Data 0.051 (0.060) Loss 6.7765 (49.7408)
Epoch: [30][180/780] Time 3.316 (3.307) Data 0.051 (0.060) Loss 24.7917 (48.1252)
Epoch: [30][210/780] Time 3.328 (3.308) Data 0.057 (0.060) Loss 43.2048 (45.7781)
Epoch: [30][240/780] Time 3.316 (3.309) Data 0.053 (0.060) Loss 9.2903 (46.9355)
Epoch: [30][270/780] Time 3.318 (3.310) Data 0.056 (0.060) Loss 32.4084 (47.8221)
Epoch: [30][300/780] Time 3.312 (3.311) Data 0.054 (0.060) Loss 10.6170 (48.4782)
Epoch: [30][330/780] Time 3.317 (3.312) Data 0.058 (0.060) Loss 80.3580 (49.2856)
Epoch: [30][360/780] Time 3.312 (3.312) Data 0.054 (0.060) Loss 42.1243 (47.9596)
Epoch: [30][390/780] Time 3.333 (3.312) Data 0.053 (0.060) Loss 10.7455 (48.1517)
Epoch: [30][420/780] Time 3.310 (3.313) Data 0.056 (0.060) Loss 51.4237 (51.7737)
Epoch: [30][450/780] Time 3.322 (3.313) Data 0.058 (0.060) Loss 3.4354 (51.9584)
Epoch: [30][480/780] Time 3.319 (3.313) Data 0.054 (0.060) Loss 52.5617 (50.9166)
Epoch: [30][510/780] Time 3.317 (3.314) Data 0.052 (0.060) Loss 65.9142 (49.7957)
Epoch: [30][540/780] Time 3.315 (3.314) Data 0.054 (0.060) Loss 11.0905 (50.6042)
Epoch: [30][570/780] Time 3.319 (3.314) Data 0.055 (0.060) Loss 37.1472 (50.6095)
Epoch: [30][600/780] Time 3.306 (3.314) Data 0.057 (0.060) Loss 33.8501 (50.7312)
Epoch: [30][630/780] Time 3.313 (3.314) Data 0.053 (0.060) Loss 8.8248 (50.7008)
Epoch: [30][660/780] Time 3.304 (3.315) Data 0.053 (0.060) Loss 3.7508 (51.2106)
Epoch: [30][690/780] Time 3.304 (3.315) Data 0.053 (0.060) Loss 6.0375 (51.1201)
Epoch: [30][720/780] Time 3.329 (3.315) Data 0.052 (0.060) Loss 106.9077 (50.8812)
Epoch: [30][750/780] Time 3.313 (3.315) Data 0.053 (0.060) Loss 73.4757 (50.8240)
begin test
MAE 51.675
best MAE 20.173
epoch 31, processed 48360 samples, lr 0.0000001000
Hello, when I used your network structure to train the ShanghaiTech PartB data set, I iterated very slowly. There were 390 pieces of the as the training set and 10 pieces as the val set. Because your code use every train 4 times so it excually 780 pic.After a day of training, I iterated 31 rounds.The parameters have not been modified, but batchsize has been increased to 2. I don't know how long you train this network. I maybe in trouble
args.original_lr = 1e-7 args.lr = 1e-7 args.batch_size = 2 args.momentum = 0.95 args.decay = 5*1e-4 args.start_epoch = 0 args.epochs = 400 args.steps = [-1,1,100,150] args.scales = [1,1,1,1] args.workers = 4
epoch 30, processed 46800 samples, lr 0.0000001000 Epoch: [30][0/780] Time 1.599 (1.599) Data 0.081 (0.081) Loss 15.9025 (15.9025)
Epoch: [30][30/780] Time 3.322 (3.261) Data 0.053 (0.060) Loss 10.3198 (44.3440)
Epoch: [30][60/780] Time 3.317 (3.288) Data 0.054 (0.060) Loss 41.0672 (43.5530)
Epoch: [30][90/780] Time 3.323 (3.298) Data 0.057 (0.060) Loss 142.7686 (42.1636) Epoch: [30][120/780] Time 3.319 (3.302) Data 0.054 (0.060) Loss 14.8476 (52.9237)
Epoch: [30][150/780] Time 3.310 (3.305) Data 0.051 (0.060) Loss 6.7765 (49.7408)
Epoch: [30][180/780] Time 3.316 (3.307) Data 0.051 (0.060) Loss 24.7917 (48.1252)
Epoch: [30][210/780] Time 3.328 (3.308) Data 0.057 (0.060) Loss 43.2048 (45.7781)
Epoch: [30][240/780] Time 3.316 (3.309) Data 0.053 (0.060) Loss 9.2903 (46.9355)
Epoch: [30][270/780] Time 3.318 (3.310) Data 0.056 (0.060) Loss 32.4084 (47.8221)
Epoch: [30][300/780] Time 3.312 (3.311) Data 0.054 (0.060) Loss 10.6170 (48.4782)
Epoch: [30][330/780] Time 3.317 (3.312) Data 0.058 (0.060) Loss 80.3580 (49.2856)
Epoch: [30][360/780] Time 3.312 (3.312) Data 0.054 (0.060) Loss 42.1243 (47.9596)
Epoch: [30][390/780] Time 3.333 (3.312) Data 0.053 (0.060) Loss 10.7455 (48.1517)
Epoch: [30][420/780] Time 3.310 (3.313) Data 0.056 (0.060) Loss 51.4237 (51.7737)
Epoch: [30][450/780] Time 3.322 (3.313) Data 0.058 (0.060) Loss 3.4354 (51.9584)
Epoch: [30][480/780] Time 3.319 (3.313) Data 0.054 (0.060) Loss 52.5617 (50.9166)
Epoch: [30][510/780] Time 3.317 (3.314) Data 0.052 (0.060) Loss 65.9142 (49.7957)
Epoch: [30][540/780] Time 3.315 (3.314) Data 0.054 (0.060) Loss 11.0905 (50.6042)
Epoch: [30][570/780] Time 3.319 (3.314) Data 0.055 (0.060) Loss 37.1472 (50.6095)
Epoch: [30][600/780] Time 3.306 (3.314) Data 0.057 (0.060) Loss 33.8501 (50.7312)
Epoch: [30][630/780] Time 3.313 (3.314) Data 0.053 (0.060) Loss 8.8248 (50.7008)
Epoch: [30][660/780] Time 3.304 (3.315) Data 0.053 (0.060) Loss 3.7508 (51.2106)
Epoch: [30][690/780] Time 3.304 (3.315) Data 0.053 (0.060) Loss 6.0375 (51.1201)
Epoch: [30][720/780] Time 3.329 (3.315) Data 0.052 (0.060) Loss 106.9077 (50.8812) Epoch: [30][750/780] Time 3.313 (3.315) Data 0.053 (0.060) Loss 73.4757 (50.8240)
begin test