tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

TQDM Progress Bar does not work correctly when initial_epoch is nonzero #1748

Open relaxation82 opened 4 years ago

relaxation82 commented 4 years ago

System information

Describe the bug

Progress bar of the whole training process over multiple epochs never reaches 100% when the training is restarted, and parameter initial_epoch is therefore nonzero.

Code to reproduce the issue

Essentially

model.fit(x=X, y=y, class_weight=None, batch_size=batchSize, verbose=0, callbacks=tfa.callbacks.TQDMProgressBar(), validation_split=0.2, shuffle=True, epochs=epochCount, initial_epoch=initialEpoch)

where initialEpoch > 0

Other info / logs

The issue should be rather clear with the provided info

shun-lin commented 4 years ago

Hi @relaxation82, thanks so much for reporting this! Yes this is the edge case that we have not considered! Will post a fix soon!

shun-lin commented 4 years ago

Hi,

I dive a bit deeper into the issue and I think that currently it is not possible to get initial_epoch into callback (the code in TensorFlow source looks like they are using initial_epochmore to fetch the previous training data), therefore it wouldn't be possible to fix this without either update from the TF side to allow Callback side to access initial_epoch (for example model.fit will populate self.params["initial_epoch"] for callbacks, or we ask the user to provide that parameter again in the callback. I think the 1st approach (ask TF to update on their side) is the correct approach, I will add a issue request on TF side to see what are their opinions, thanks you so much!

shun-lin commented 4 years ago

I have raised an issue on TensorFlow's side, hopefully we will hear back soon :)