soumith / imagenet-multiGPU.torch

an imagenet example in torch.
BSD 2-Clause "Simplified" License
401 stars 158 forks source link

Data loading time #24

Closed Atcold closed 8 years ago

Atcold commented 8 years ago

One question. This is how my log looks like.

=> Criterion
nn.ClassNLLCriterion
==> Converting model to CUDA
==> doing epoch on training data:
==> online epoch # 1
Epoch: [1][1/10000]     Time 6.079 Err 8.5379 Top1-%: 0.20 LR 1e-02 DataLoadingTime 67.312
Epoch: [1][2/10000]     Time 1.972 Err 8.5480 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.043
Epoch: [1][3/10000]     Time 1.968 Err 8.5506 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.054
Epoch: [1][4/10000]     Time 1.957 Err 8.5445 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.055
Epoch: [1][5/10000]     Time 1.979 Err 8.5556 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.040
Epoch: [1][6/10000]     Time 1.932 Err 8.5436 Top1-%: 0.20 LR 1e-02 DataLoadingTime 0.046
Epoch: [1][7/10000]     Time 1.973 Err 8.5321 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.041
Epoch: [1][8/10000]     Time 1.955 Err 8.5400 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.033
Epoch: [1][9/10000]     Time 1.937 Err 8.5451 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.033
Epoch: [1][10/10000]    Time 1.963 Err 8.5365 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.029
Epoch: [1][11/10000]    Time 2.195 Err 8.5423 Top1-%: 0.00 LR 1e-02 DataLoadingTime 21.177
Epoch: [1][12/10000]    Time 1.960 Err 8.5410 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.048
Epoch: [1][13/10000]    Time 1.939 Err 8.5555 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.036
Epoch: [1][14/10000]    Time 1.969 Err 8.5508 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.037
Epoch: [1][15/10000]    Time 1.976 Err 8.5580 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.046
Epoch: [1][16/10000]    Time 1.947 Err 8.5506 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.036
Epoch: [1][17/10000]    Time 2.032 Err 8.5355 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.034
Epoch: [1][18/10000]    Time 2.012 Err 8.5335 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.050
Epoch: [1][19/10000]    Time 1.990 Err 8.5225 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.043
Epoch: [1][20/10000]    Time 1.983 Err 8.5323 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.048
Epoch: [1][21/10000]    Time 2.193 Err 8.5370 Top1-%: 0.10 LR 1e-02 DataLoadingTime 24.326
Epoch: [1][22/10000]    Time 2.027 Err 8.5282 Top1-%: 0.20 LR 1e-02 DataLoadingTime 0.049
Epoch: [1][23/10000]    Time 1.904 Err 8.5333 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.041
Epoch: [1][24/10000]    Time 1.983 Err 8.5305 Top1-%: 0.10 LR 1e-02 DataLoadingTime 0.065
Epoch: [1][25/10000]    Time 1.954 Err 8.5322 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.056
Epoch: [1][26/10000]    Time 1.947 Err 8.5401 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.037
Epoch: [1][27/10000]    Time 1.964 Err 8.5405 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.043
Epoch: [1][28/10000]    Time 2.014 Err 8.5287 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.046
Epoch: [1][29/10000]    Time 1.997 Err 8.5278 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.050
Epoch: [1][30/10000]    Time 2.017 Err 8.5354 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.032
Epoch: [1][31/10000]    Time 2.215 Err 8.5288 Top1-%: 0.00 LR 1e-02 DataLoadingTime 25.476

There's a ~20 s loading time every 10 lines (I'm using 10 donkeys). Is this 'OK'?

Atcold commented 8 years ago

Moreover, there are 'sleeping' processes... which are not using the CPUs at all...

screenshot 2015-11-05 19 34 17

karandwivedi42 commented 8 years ago

@Atcold did you solve this?

Atcold commented 8 years ago

Haha, yes. I was caching data in swap. I bought new RAM since then, and it fixed the problem, but I forgot to close the issue.

JiayunLi commented 7 years ago

I encountered the similar problem. But I'm still don't understand why there is a significant increase of loading time? Based on my understanding, those threads seem to do same job (load images).