This PR workarounds an issue where randomize_hyperparameters generated same repeating values for model hyperparameters when the global seed was set. The issue only occurred when tf.function compilation was enabled.
The issue seems to be related to the following documented behaviour of tensorflow:
Note that tf.function acts like a re-run of a program in this case. When the global seed is set but operation seeds are not set, the sequence of random numbers are the same for each tf.function.
When the function being compiled has a dynamic conditional (i.e. tf.cond) and the branches contain randomization calls, it seems internally tensorflow acts like "... re-run of a program". This is likely related to the fact that AutoGraph executes both branches during tracing. This could potentially be a tensorflow bug, but requires more investigation.
This PR simply removes the tf.Tensor condition expression (which is converted to tf.cond via AutoGraph) to a static python expression. Also added a unit test to catch the issue, which fails on previous version of the code.
Fully backwards compatible: yes
PR checklist
[X] The quality checks are all passing
[X] The bug case / new feature is covered by tests
[ ] Any new features are well-documented (in docstrings or notebooks)
Related issue(s)/PRs: None
Summary
This PR workarounds an issue where
randomize_hyperparameters
generated same repeating values for model hyperparameters when the global seed was set. The issue only occurred whentf.function
compilation was enabled.The issue seems to be related to the following documented behaviour of tensorflow:
When the function being compiled has a dynamic conditional (i.e.
tf.cond
) and the branches contain randomization calls, it seems internally tensorflow acts like "... re-run of a program". This is likely related to the fact that AutoGraph executes both branches during tracing. This could potentially be a tensorflow bug, but requires more investigation.This PR simply removes the
tf.Tensor
condition expression (which is converted totf.cond
via AutoGraph) to a static python expression. Also added a unit test to catch the issue, which fails on previous version of the code.Fully backwards compatible: yes
PR checklist