Open Farbdose opened 1 year ago
@Farbdose Thank you for reporting this issue! Could you please provide the access for the standalone code? You may share the colab gist as well if possible. Thank you!
@sushreebarsa Oh sorry, that slipped my mind. Done.
@SuryanarayanaY I was able to replicate the issue on colab, please find the gist here. Thank you!
Hi @Farbdose ,
This might be due to the below note from tfa.optimizers.MultiOptimizer
API.
Note: Currently, tfa.optimizers.MultiOptimizer does not support callbacks that modify optimizers.
Since setting policy to mixed_float16
automatically applies Loss scaling that causing the error.
If you change the policy that won't apply loss scaling then there is no error here.For example I have tried setting policy to 'float64' and 'float32' and in both cases there is no error found as per the attached gist.
Thankyou!
@SuryanarayanaY Unfortunally I need mixed_float16 due to ram constraints. I did some digging and it looks like, the init code for the iterations variable is not triggered. I'm found a hacky solution for tensorflow 2.10.0 (which I actually need - for some reason the error persists in 2.12 even though the code in question was changed)
In 2.10 iterations is initialized via its getter so I added a
trigger_iterations_init_to_bypass_issue17414 = self._optimizer.iterations
here https://github.com/keras-team/keras/blob/v2.10.0/keras/mixed_precision/loss_scale_optimizer.py#L669
Which works for now. I think the main problem is this https://github.com/keras-team/keras/blob/master/keras/mixed_precision/loss_scale_optimizer.py#L645
I ran into the same problem today. Thanks to your hacky solution @Farbdose i was able to fix it by triggering the initialization of iterations right after I initialized the LossScaleOptimizer:
optimizer_nerf = tf.keras.optimizers.Adam()
optimizer_feature = tf.keras.optimizers.Adam()
optimizers_and_layers = [
(optimizer_nerf, model.layers[:2]),
(optimizer_feature, model.layers[2:])
]
optimizer = MultiOptimizer(optimizers_and_layers)
optimizer = tf.keras.mixed_precision.LossScaleOptimizer(optimizer)
optimizer._optimizer.iterations
@Farbdose ,Thanks for your hacky solution.Lets see how it might help us.
@qlzh727 could you take a look?
Looking through the optimizer code, I don't see why calling the iterations
property updates the underlying tf variable.
I see that you authored https://github.com/keras-team/keras/blob/master/keras/optimizers/optimizer.py#L96 -- do you have some context about how this initialization might be failing in this case?
@ianstenbit @qlzh727 calling the iterations property is a 2.10 (and 2.11 I think) only solution, because in 2.10 the iterations initialization is handled here in this getter https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8ef/keras/optimizers/optimizer_v2/optimizer_v2.py#L1136 I haven't figured out why its not working in 2.12 yet. In 2.12 iterations in initialized here https://github.com/keras-team/keras/blob/541177c71887172d11514cda24067f7ab8d8440e/keras/optimizers/optimizer.py#L93 . I suspect that the constructor of optimized V2 is never actually called though - based on this comment here https://github.com/keras-team/keras/blob/541177c71887172d11514cda24067f7ab8d8440e/keras/mixed_precision/loss_scale_optimizer.py#L645
Update: I had to update my colab with
!pip install tensorflow -U
!pip install tf-nightly tfds-nightly tfa-nightly
hopefully that didn't create a version mismatch...
Based on my analysis here https://colab.research.google.com/drive/17rvRgYM6T8MDD0kkkpL_herqcoOMVhYj?usp=sharing
The optimizer has a properly set _iterations
(debunking my assumption from above) but the MultiOptimizer hasn't I'm struggeling to find out where the actual code is comming from, running
!grep -rnw '/usr/local/lib/python3.8/dist-packages' -e 'def iterations' -A 10
inside colab finds the init code in /usr/local/lib/python3.8/dist-packages/keras/optimizers/optimizer_v2/optimizer_v2.py:1145
which shouldn't be there by my understanding in 2.12
so I'm a bit lost here as apparently my colab hasn't the keras version it claims to have....
So basically this still works in 2.12 (at least in the colab above) but I have no idea why...
optimizers_and_layers = [
(optimizers[0], model.layers[:2]),
(optimizers[1], model.layers[2:])
]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
optimizer.iterations
Update 2:
to create even more confusion, the original problem was that this line crashed because iterations
wasn't initialized even though that should happen through the getter:
optimizer.iterations.assign_add(1, read_value=False)
I found out by chance that the error goes away if I access iterations by hand before that line...
now I tried executing the above line manually to trigger the init and it workes same line - just in my file and it works - but if it runs inside the graph executor ... boom
So maybe this line running inside the graph executor is the actual problem?
System information.
Describe the problem. I want to use mixed_precision and multi_optimizer at the same time.
Describe the current behavior. Tensorflow crashes with a Graph Execution error when using mixed precision with the multioptimizer
Describe the expected behavior. No crash.
Contributing.
Standalone code to reproduce the issue.
https://colab.research.google.com/drive/1dk9SXd88aVwWHs7mshnX8sR8FVoEJOt-?usp=sharing