shenweichen / DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
https://deepctr-torch.readthedocs.io/en/latest/index.html
Apache License 2.0
2.95k stars 696 forks source link

Any way to log the step artifacts ? #252

Open gajagajago opened 2 years ago

gajagajago commented 2 years ago

Hello. I am using DeepFM implementation and trying to log the batch time after each step. I want to do something like below, and get how much time took to process each batch.

# Should log batch time here
    batchtime_log_callback = LambdaCallback(
        on_batch_begin=lambda batch, logs: batchtime_log.write(str(batch)),
        on_batch_end=lambda batch, logs: batchtime_log.write(str(batch)))

    model.fit(
        train_model_input, 
        train[target].values,
        callbacks=[batchtime_log_callback],
        batch_size=batch_size, 
        epochs=num_epoch, 
        verbose=2, 
        validation_split=val_ratio)

The desired output print would be like below, but it is okay if other artifacts are printed together. I can post-process. Any method?

xxx ms
yyy ms 
.
.
.
zanshuxun commented 2 years ago
  1. modify the basemodel.py like this:

image

result:

image

  1. set verbose=1 in model.fit(), then you can calculate the time of each epoch from the tqdm log. (each iteration is a batch of data)

image

gajagajago commented 2 years ago

First of all, thanks for the reply. Just to add one thing, I think we should add this line torch.cuda.synchronize() before calling time.time() when distributed training is enabled. This way we can assure that all streams in each CUDA devices has totally finished before logging the time. Thanks for the reply once again!

Ref: https://pytorch.org/docs/stable/generated/torch.cuda.synchronize.html