liuzechun / MetaPruning

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. In ICCV 2019.
MIT License
351 stars 74 forks source link

The speed of training mobilenetv2 PruningNet is slow. #8

Closed dsfour closed 5 years ago

dsfour commented 5 years ago

Hi, I try to train mobilenetv2 PruningNet from scratch with 4 v100 GPUs(batch_size=256). I find that train one batch data spend about 3 seconds (Probably because of the random crop of the network in the training process.). Is that normal? How many time do you spend for training mobilenetv2 PruningNet from scratch (64 epoch)?

part of train log: Epoch: [0][0/5004] Time 3.857 (3.857) Data 0.000 (0.000) Loss 6.9178 (6.9178) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Epoch: [0][1/5004] Time 3.421 (3.639) Data 0.000 (0.000) Loss 6.9392 (6.9285) Prec@1 0.000 (0.000) Prec@5 0.781 (0.391) Epoch: [0][2/5004] Time 3.475 (3.584) Data 0.000 (0.000) Loss 6.9520 (6.9363) Prec@1 0.000 (0.000) Prec@5 0.391 (0.391) Epoch: [0][3/5004] Time 3.235 (3.497) Data 0.000 (0.000) Loss 6.9477 (6.9392) Prec@1 0.000 (0.000) Prec@5 0.781 (0.488) Epoch: [0][4/5004] Time 3.162 (3.430) Data 0.000 (0.000) Loss 6.9354 (6.9384) Prec@1 0.781 (0.156) Prec@5 0.781 (0.547) Epoch: [0][5/5004] Time 3.129 (3.380) Data 0.000 (0.000) Loss 6.9591 (6.9419) Prec@1 0.391 (0.195) Prec@5 0.391 (0.521) Epoch: [0][6/5004] Time 3.146 (3.347) Data 0.000 (0.000) Loss 6.9494 (6.9429) Prec@1 0.781 (0.279) Prec@5 0.781 (0.558) Epoch: [0][7/5004] Time 3.138 (3.321) Data 0.000 (0.000) Loss 6.9903 (6.9489) Prec@1 0.000 (0.244) Prec@5 0.781 (0.586) Epoch: [0][8/5004] Time 3.393 (3.329) Data 0.000 (0.000) Loss 6.9696 (6.9512) Prec@1 0.000 (0.217) Prec@5 0.000 (0.521) Epoch: [0][9/5004] Time 3.495 (3.345) Data 0.000 (0.000) Loss 7.0030 (6.9563) Prec@1 0.000 (0.195) Prec@5 0.000 (0.469) Epoch: [0][10/5004] Time 3.307 (3.342) Data 0.000 (0.000) Loss 7.0157 (6.9617) Prec@1 0.391 (0.213) Prec@5 0.781 (0.497) Epoch: [0][11/5004] Time 3.254 (3.334) Data 0.000 (0.000) Loss 7.0124 (6.9660) Prec@1 0.000 (0.195) Prec@5 0.781 (0.521) Epoch: [0][12/5004] Time 3.694 (3.362) Data 0.000 (0.000) Loss 7.0236 (6.9704) Prec@1 0.000 (0.180) Prec@5 1.172 (0.571) Epoch: [0][13/5004] Time 3.186 (3.350) Data 0.000 (0.000) Loss 7.0330 (6.9749) Prec@1 0.000 (0.167) Prec@5 0.000 (0.530) Epoch: [0][14/5004] Time 3.180 (3.338) Data 0.000 (0.000) Loss 7.0146 (6.9775) Prec@1 0.000 (0.156) Prec@5 0.781 (0.547) Epoch: [0][15/5004] Time 3.272 (3.334) Data 0.000 (0.000) Loss 7.1130 (6.9860) Prec@1 0.000 (0.146) Prec@5 0.000 (0.513) Epoch: [0][16/5004] Time 2.912 (3.309) Data 0.000 (0.000) Loss 7.0441 (6.9894) Prec@1 0.000 (0.138) Prec@5 0.781 (0.528) Epoch: [0][17/5004] Time 3.199 (3.303) Data 0.000 (0.000) Loss 7.0701 (6.9939) Prec@1 0.000 (0.130) Prec@5 0.391 (0.521) Epoch: [0][18/5004] Time 3.163 (3.296) Data 0.000 (0.000) Loss 7.1076 (6.9999) Prec@1 0.000 (0.123) Prec@5 0.000 (0.493) Epoch: [0][19/5004] Time 3.197 (3.291) Data 0.000 (0.000) Loss 7.1321 (7.0065) Prec@1 0.000 (0.117) Prec@5 0.391 (0.488) Epoch: [0][20/5004] Time 3.116 (3.283) Data 0.000 (0.000) Loss 7.0883 (7.0104) Prec@1 0.000 (0.112) Prec@5 0.391 (0.484) Epoch: [0][21/5004] Time 3.464 (3.291) Data 0.000 (0.000) Loss 7.0444 (7.0119) Prec@1 0.000 (0.107) Prec@5 0.000 (0.462) Epoch: [0][22/5004] Time 3.135 (3.284) Data 0.000 (0.000) Loss 7.0642 (7.0142) Prec@1 0.000 (0.102) Prec@5 0.391 (0.459) Epoch: [0][23/5004] Time 3.392 (3.288) Data 0.000 (0.000) Loss 7.0659 (7.0163) Prec@1 0.000 (0.098) Prec@5 0.781 (0.472) Epoch: [0][24/5004] Time 3.117 (3.282) Data 0.000 (0.000) Loss 7.0385 (7.0172) Prec@1 0.000 (0.094) Prec@5 0.391 (0.469) Epoch: [0][25/5004] Time 3.271 (3.281) Data 0.000 (0.000) Loss 7.0659 (7.0191) Prec@1 0.000 (0.090) Prec@5 0.781 (0.481) Epoch: [0][26/5004] Time 3.461 (3.288) Data 0.000 (0.000) Loss 7.0382 (7.0198) Prec@1 0.000 (0.087) Prec@5 0.391 (0.477) Epoch: [0][27/5004] Time 2.958 (3.276) Data 0.000 (0.000) Loss 7.0603 (7.0213) Prec@1 0.000 (0.084) Prec@5 0.000 (0.460) Epoch: [0][28/5004] Time 3.120 (3.271) Data 0.000 (0.000) Loss 7.1257 (7.0249) Prec@1 0.391 (0.094) Prec@5 0.391 (0.458) Epoch: [0][29/5004] Time 3.212 (3.269) Data 0.000 (0.000) Loss 7.0864 (7.0269) Prec@1 0.000 (0.091) Prec@5 0.391 (0.456) Epoch: [0][30/5004] Time 3.090 (3.263) Data 0.000 (0.000) Loss 7.1347 (7.0304) Prec@1 0.391 (0.101) Prec@5 0.391 (0.454) Epoch: [0][31/5004] Time 2.839 (3.250) Data 0.000 (0.000) Loss 7.0732 (7.0317) Prec@1 0.000 (0.098) Prec@5 0.781 (0.464) Epoch: [0][32/5004] Time 3.346 (3.253) Data 0.000 (0.000) Loss 7.1425 (7.0351) Prec@1 0.391 (0.107) Prec@5 0.391 (0.462) Epoch: [0][33/5004] Time 3.508 (3.260) Data 0.000 (0.000) Loss 7.0733 (7.0362) Prec@1 0.000 (0.103) Prec@5 0.781 (0.471) Epoch: [0][34/5004] Time 3.215 (3.259) Data 0.000 (0.000) Loss 7.1465 (7.0394) Prec@1 0.000 (0.100) Prec@5 0.000 (0.458) Epoch: [0][35/5004] Time 3.071 (3.254) Data 0.000 (0.000) Loss 7.0800 (7.0405) Prec@1 0.781 (0.119) Prec@5 1.562 (0.488)

liuzechun commented 5 years ago

The batch size being too small maybe? With V100 you can try batchsize 1024 I think. I spent about 3 days in training the PruningNet for MobileNet-v2