Open Cristy94 opened 4 years ago
Hi @Cristy94, thanks for using this project and sharing your findings. This is something I found with the TF 2.0 release candidates but disappeared with the TF 2.0 final release so I thought this was fixed in TF. But after upgrading to TF 2.1, this became again an issue when training WDSR models (that use Tensorflow Addons), and removing @tf.function
fixes it.
Why does that decorator make the
train_step
function get stuck?
I don't know yet, have to investigate it.
Is it safe to remove it?
Yes it is safe, but may impact training performance (speed), although I didn't measure the difference yet.
Is it something wrong with the function that makes it incompatible with
@tf.cunction
?
Good question. Again, needs to be investigated.
I'll leave this ticket open until I fixed these issues with WDSR and SRGAN+WDSR in master and hopefully have then answers to your questions.
Has this been fixed yet for TF2.1? Thanks.
I tried training on a custom dataset, but the train function always got stuck in
train_step
at the return statement. After spending 2 hours to understand why the function is called twice without ever returning and why it gets stuck I realized it's because of the@tf.function
decorator. As soon as I removed that decorator the training worked as expected.Why does that decorator make the
train_step
function get stuck? Is it safe to remove it? Is it something wrong with the function that makes it incompatible with@tf.cunction
?https://github.com/krasserm/super-resolution/blob/602a490ec62045823e37c475229e3bc42c8d850c/train.py#L74-L86