TQDM Progress Bar does not work correctly when initial_epoch is nonzero

relaxation82 commented 4 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OS X 10.15.4
TensorFlow version and how it was installed (source or binary): 2.2.0-rc3, binary
TensorFlow-Addons version and how it was installed (source or binary): 0.9.1
Python version: 3.7.7
Is GPU used? (yes/no): no

Describe the bug

Progress bar of the whole training process over multiple epochs never reaches 100% when the training is restarted, and parameter initial_epoch is therefore nonzero.

Code to reproduce the issue

Essentially

model.fit(x=X, y=y, class_weight=None, batch_size=batchSize, verbose=0, callbacks=tfa.callbacks.TQDMProgressBar(), validation_split=0.2, shuffle=True, epochs=epochCount, initial_epoch=initialEpoch)

where initialEpoch > 0

Other info / logs

The issue should be rather clear with the provided info

shun-lin commented 4 years ago

Hi @relaxation82, thanks so much for reporting this! Yes this is the edge case that we have not considered! Will post a fix soon!

shun-lin commented 4 years ago

Hi,

I dive a bit deeper into the issue and I think that currently it is not possible to get initial_epoch into callback (the code in TensorFlow source looks like they are using initial_epochmore to fetch the previous training data), therefore it wouldn't be possible to fix this without either update from the TF side to allow Callback side to access initial_epoch (for example model.fit will populate self.params["initial_epoch"] for callbacks, or we ask the user to provide that parameter again in the callback. I think the 1st approach (ask TF to update on their side) is the correct approach, I will add a issue request on TF side to see what are their opinions, thanks you so much!

shun-lin commented 4 years ago

I have raised an issue on TensorFlow's side, hopefully we will hear back soon :)

tensorflow / addons

TQDM Progress Bar does not work correctly when initial_epoch is nonzero #1748