victoresque / pytorch-template

PyTorch deep learning projects made easy.
MIT License
4.75k stars 1.09k forks source link

Add time calculation for each epoch/batch in trianer #31

Closed amjltc295 closed 5 years ago

amjltc295 commented 6 years ago

I think it would be nice to know the training time. This could be seen by changing the logging configure as well, but the delta time may be hard to find if there are too many batches.

SunQpark commented 6 years ago

Hi, @amjltc295 I agree that printing the training time would be good, but there are some points that can be improved.

Train Epoch: 1 [45056/54000 (83%)] Loss: 0.523107 Time: 0.02s
    epoch          : 1
    epoch_time     : 3.8233487606048584
    loss           : 1.0713483376323052
    my_metric      : 0.642586723663522
    my_metric2     : 0.8484682095125786
    val_epoch_time : 2.062150239944458
    val_loss       : 0.2415582835674286
    val_my_metric  : 0.9347896852355072
    val_my_metric2 : 0.9883803215579711
Saving checkpoint: saved/Mnist_LeNet/1026_042054/checkpoint-epoch1.pth ...
Saving current best: model_best.pth ...
  1. Displaying the batch time is not intuitive
  2. epoch time does not look good

Since it is displayed on each logging steps and just says 'Time', I think the batch time could be easily mistaken as the step time.

How do you think about doing it with this format

Train Epoch: 1 [45056/54000 (83%) 3.2s/4s eta] Loss: 0.523107

and just skipping the epoch time?

I also think that using datetime.now() would be better for consistency(with base_trainer), but this would be minor point compared with above.

amjltc295 commented 6 years ago

@SunQpark Sorry, I don't understand the meaning of 3.2s/4s eta. Could you explain it a bit? I think the purpose of printing time is to let users estimate the total training time and check if there is a bug that slow down the process. That's the reason I put batch time and epoch time. It wouldn't be hard to tell the batch time and step time; or we could add change it to Batch time or move it inside the[ ].

SunQpark commented 5 years ago

Sorry for late response @amjltc295. Previously I thought that printing time would be helpful to recognize progress of training, and to estimate when does it end. My comment meant for that, displaying time passed / total time, but now I understand that this is what you intended.

However, I'm still not sure that this feature is suitable for this project. Checking the training time for debugging makes sense, but I don't think we should check that for every batches. I think using torch.autograd.profiler before training is more suitable solution. How do you think about this?

amjltc295 commented 5 years ago

torch.autograd.profiler is a new thing for me. It some time for me to figure out how it works, but it seems to be a good solution.