Bijectors Permute() and RealNVP() failing with tf.keras Model in TF2.0-nightly

lmartak commented 5 years ago

When using bijectors tfb.Permute and tfb.RealNVP to transform an input to an output in a keras model (using either of forward() or inverse() transformations), one runs into multiple (possibly related) errors, with TensorFLow 2.0.0-dev20190408 and TensorFlow Probability 0.7.0-dev.

To demonstrate, when trying to transform tf.keras input to output using one of these bijectors or any chaining of them (testing for both forward() and inverse() transforms):

bijector = permute # no error here
bijector = realnvp
bijector = realnvp(permute)
bijector = permute(realnvp)

one gets the following 3 errors (respectively):

ValueError: Trying to share variable real_nvp_default_template/dense/kernel, but specified shape (1, 784) and found shape (392, 784).
NotImplementedError: Rightmost dimension must be known prior to graph execution.
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.

Click here to see the full test code

```python import tensorflow as tf import tensorflow_probability as tfp import tensorflow_datasets as tfds tf.config.gpu.set_per_process_memory_growth(True) tfk = tf.keras tfkl = tf.keras.layers tfpl = tfp.layers tfd = tfp.distributions tfb = tfp.bijectors class BijectorForward(tfkl.Layer): def __init__(self, bijector): super(BijectorForward, self).__init__() self.bijector = bijector def call(self, input): return self.bijector.forward(input) class BijectorInverse(tfkl.Layer): def __init__(self, bijector): super(BijectorInverse, self).__init__() self.bijector = bijector def call(self, input): return self.bijector.inverse(input) dim_z = 28**2 shape_x = (28, 28, 1) def get_bijectors(): permute = tfb.Permute(tf.concat([tf.range(dim_z//2, dim_z), tf.range(0, dim_z//2)], axis=0)) additive_cf = tfb.real_nvp_default_template( [dim_z, dim_z//2], shift_only=True, activation=None) realnvp = tfb.RealNVP( num_masked=dim_z//2, shift_and_log_scale_fn=additive_cf, is_constant_jacobian=True) return permute, realnvp def construct_keras_bijector_models(bijector): input_z = tfkl.Input(shape=(dim_z,)) output_x = BijectorForward(bijector)(input_z) model_forward = tfk.models.Model(inputs=input_z, outputs=output_x) input_x = tfkl.Input(shape=shape_x) output_z = BijectorInverse(bijector)(input_x) model_inverse = tfk.models.Model(inputs=input_x, outputs=output_z) return model_forward, model_inverse def test1(): permute, realnvp = get_bijectors() bijector = permute construct_keras_bijector_models(bijector) def test2(): permute, realnvp = get_bijectors() bijector = realnvp construct_keras_bijector_models(bijector) def test3(): permute, realnvp = get_bijectors() bijector = realnvp(permute) construct_keras_bijector_models(bijector) def test4(): permute, realnvp = get_bijectors() bijector = permute(realnvp) construct_keras_bijector_models(bijector) import traceback for test in [test1, test2, test3, test4]: print('\n------{}------\n'.format(test.__name__)) try: test() except Exception as e: print(e) #traceback.print_exc() ```

Click here to see the corresponding errors

```pytb ------test1------ 2019-04-08 17:58:42.998488: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AV X2 AVX512F FMA 2019-04-08 17:58:43.012348: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-04-08 17:58:43.157452: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50cbf50 executing computations on platform CUDA. Devices: 2019-04-08 17:58:43.157505: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2019-04-08 17:58:43.179127: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3500000000 Hz 2019-04-08 17:58:43.180183: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4cdb210 executing computations on platform Host. Devices: 2019-04-08 17:58:43.180232: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-04-08 17:58:43.180913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607 pciBusID: 0000:65:00.0 totalMemory: 10.92GiB freeMemory: 10.51GiB 2019-04-08 17:58:43.180944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0 2019-04-08 17:58:43.181018: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-04-08 17:58:43.186741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-08 17:58:43.186780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0 2019-04-08 17:58:43.186794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N 2019-04-08 17:58:43.187269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10222 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) ------test2------ WARNING: Logging before flag parsing goes to stderr. W0408 17:58:43.255850 139912563205952 deprecation.py:323] From ~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.p y:291: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0408 17:58:43.256973 139912563205952 deprecation.py:506] From ~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1257: calling Va rianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor Trying to share variable real_nvp_default_template/dense/kernel, but specified shape (1, 784) and found shape (392, 784). originally defined at: File "bijectors-keras-errors.py", line 59, in test2 permute, realnvp = get_bijectors() File "bijectors-keras-errors.py", line 36, in get_bijectors [dim_z, dim_z//2], shift_only=True, activation=None) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.py", line 303, in real_nvp_default_template return tf.compat.v1.make_template("real_nvp_default_template", _fn) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 154, in make_template **kwargs) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 217, in make_template_internal create_graph_function=create_graph_function_) ------test3------ Rightmost dimension must be known prior to graph execution. ------test4------ The last dimension of the inputs to `Dense` should be defined. Found `None`. originally defined at: File "bijectors-keras-errors.py", line 69, in test4 permute, realnvp = get_bijectors() File "bijectors-keras-errors.py", line 36, in get_bijectors [dim_z, dim_z//2], shift_only=True, activation=None) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.py", line 303, in real_nvp_default_template return tf.compat.v1.make_template("real_nvp_default_template", _fn) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 154, in make_template **kwargs) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 217, in make_template_internal create_graph_function=create_graph_function_) ```

So I'd like to end up with a tf.keras model that can be .fit() to optimize parameters of real_nvp_default_templates MLP while using the whole chain of bijectors as a transformation of models' input to its output (or as a part of some larger transformation comprising of other trainable parameters).

Is this a bug or just a currently missing feature (as I'm on TF2 nightly)? Am I assuming some non-supported use-case here? Is there an obvious way to achieve what I need that I'm missing here?

Thanks for any response and for all the great work! TFP rocks!

ppham27 commented 5 years ago

My commit: https://github.com/tensorflow/probability/commit/2e5bf5b41a18beb69cfea1caac970ec941ec7ab2 likely fixes this.

lmartak commented 5 years ago

Thanks @ppham27, this seems like a progress, although doesn't quite fix the whole thing yet.

Now all chainings result in the same error message about variable sharing:

Trying to share variable real_nvp_default_template/dense/kernel, but specified shape (1, 784) and found shape (392, 784).

Here is an output of the test code (presented before) on latest builds

```pytb ------test1------ 2019-04-26 15:58:47.375685: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcuda.so.1 2019-04-26 15:58:47.381182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1608] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607 pciBusID: 0000:65:00.0 2019-04-26 15:58:47.382041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1721] Adding visible gpu devices: 0 2019-04-26 15:58:47.382297: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AV X2 AVX512F FMA 2019-04-26 15:58:47.521401: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4df9270 executing computations on platform CUDA. Devices: 2019-04-26 15:58:47.521428: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2019-04-26 15:58:47.543239: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3500000000 Hz 2019-04-26 15:58:47.544276: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x571c3b0 executing computations on platform Host. Devices: 2019-04-26 15:58:47.544318: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-04-26 15:58:47.545301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1608] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607 pciBusID: 0000:65:00.0 2019-04-26 15:58:47.546781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1721] Adding visible gpu devices: 0 2019-04-26 15:58:47.547190: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcudart.so.10.0 2019-04-26 15:58:47.550250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1151] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-26 15:58:47.550273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1157] 0 2019-04-26 15:58:47.550284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1170] 0: N 2019-04-26 15:58:47.552065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1294] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2590 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) ------test2------ WARNING: Logging before flag parsing goes to stderr. W0426 15:58:47.645865 140650713732928 deprecation.py:323] From ~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.p y:293: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0426 15:58:47.646932 140650713732928 deprecation.py:506] From ~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling Va rianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor Trying to share variable real_nvp_default_template/dense/kernel, but specified shape (1, 784) and found shape (392, 784). originally defined at: File "bijectors-keras-errors.py", line 59, in test2 permute, realnvp = get_bijectors() File "bijectors-keras-errors.py", line 36, in get_bijectors [dim_z, dim_z//2], shift_only=True, activation=None) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.py", line 305, in real_nvp_default_template return tf.compat.v1.make_template("real_nvp_default_template", _fn) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 160, in make_template **kwargs) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 223, in make_template_internal create_graph_function=create_graph_function_) ------test3------ Trying to share variable real_nvp_default_template_1/dense/kernel, but specified shape (1, 784) and found shape (392, 784). originally defined at: File "bijectors-keras-errors.py", line 64, in test3 permute, realnvp = get_bijectors() File "bijectors-keras-errors.py", line 36, in get_bijectors [dim_z, dim_z//2], shift_only=True, activation=None) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.py", line 305, in real_nvp_default_template return tf.compat.v1.make_template("real_nvp_default_template", _fn) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 160, in make_template **kwargs) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 223, in make_template_internal create_graph_function=create_graph_function_) ------test4------ Trying to share variable real_nvp_default_template_2/dense/kernel, but specified shape (1, 784) and found shape (392, 784). originally defined at: File "bijectors-keras-errors.py", line 69, in test4 permute, realnvp = get_bijectors() File "bijectors-keras-errors.py", line 36, in get_bijectors [dim_z, dim_z//2], shift_only=True, activation=None) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow_probability/python/bijectors/real_nvp.py", line 305, in real_nvp_default_template return tf.compat.v1.make_template("real_nvp_default_template", _fn) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 160, in make_template **kwargs) File "~/.env/tf2gpu/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 223, in make_template_internal create_graph_function=create_graph_function_) ```

TF: 2.0.0-dev20190426 TFP: 0.7.0-dev20190426

ppham27 commented 5 years ago

Trying changing your code to shape_x = (784,).

You're reusing the same bijector for both the forward and inverse layers. A bijections input and output should have the same shape.

Also worth noting that the RealNVP bijector seems to only work in graph mode. I needed to do something like model.call = tf.function(model.call) for it work in eager mode.

lmartak commented 5 years ago

A bijections input and output should have the same shape.

Thanks for pointing this out! This was indeed one of the problems, even though more precisely, bijections input and output should have the same amount of dimensions (not necessarily shape). I can make different input/output shapes work as long as I use tfb.Reshape bijector in the chain.

Also worth noting that the RealNVP bijector seems to only work in graph mode.

Thanks, this is a crucial piece. I understood that TFP-nightly is catching up with TF2.0-preview. Probably not so much as of yet. Back to stable releases, everything seems to work as intended.

Passing test

```python import tensorflow as tf import tensorflow_probability as tfp tfk = tf.keras tfkl = tf.keras.layers tfb = tfp.bijectors if tf.test.is_gpu_available(): config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) tfk.backend.set_session(session) class BijectorForward(tfkl.Layer): def __init__(self, bijector): super(BijectorForward, self).__init__() self.bijector = bijector def call(self, input): return self.bijector.forward(input) class BijectorInverse(tfkl.Layer): def __init__(self, bijector): super(BijectorInverse, self).__init__() self.bijector = bijector def call(self, input): return self.bijector.inverse(input) dim_z = 28**2 shape_x = (28, 28, 1) def get_bijectors(): permute = tfb.Permute(tf.concat([tf.range(dim_z//2, dim_z), tf.range(0, dim_z//2)], axis=0)) additive_cf = tfb.real_nvp_default_template( [dim_z, dim_z//2], shift_only=True, activation=None) realnvp = tfb.RealNVP( num_masked=dim_z//2, shift_and_log_scale_fn=additive_cf, is_constant_jacobian=True) reshape = tfb.Reshape(event_shape_out=shape_x, event_shape_in=(dim_z,)) return permute, realnvp, reshape def construct_keras_bijector_models(bijector): input_z = tfkl.Input(shape=(dim_z,)) output_x = BijectorForward(bijector)(input_z) model_forward = tfk.models.Model(inputs=input_z, outputs=output_x) input_x = tfkl.Input(shape=shape_x) output_z = BijectorInverse(bijector)(input_x) model_inverse = tfk.models.Model(inputs=input_x, outputs=output_z) return model_forward, model_inverse def test1(): permute, realnvp, reshape = get_bijectors() bijector = reshape(permute) construct_keras_bijector_models(bijector) def test2(): permute, realnvp, reshape = get_bijectors() bijector = reshape(realnvp) construct_keras_bijector_models(bijector) def test3(): permute, realnvp, reshape = get_bijectors() bijector = reshape(realnvp(permute)) construct_keras_bijector_models(bijector) def test4(): permute, realnvp, reshape = get_bijectors() bijector = reshape(permute(realnvp)) construct_keras_bijector_models(bijector) for test in [test1, test2, test3, test4]: print('\n------{}------\n'.format(test.__name__)) try: test() except Exception as e: print(e) ```

TF: 1.13.1 TFP: 0.6.0

Final question: can one expect TFP-nightly getting compatible with TF2.0 pre-releases before final TF2 stable release comes out?

Answer to this question would resolve the issue for me.

Thanks!

axch commented 5 years ago

We expect tfp-nightly to be compatible with tf2.0 pre-releases already. I'd like to close this, but if you find specific issues with tf2 compatibility, by all means please file them.

@ppham27 Want to file a separate issue about RealNVP not working in Eager mode? Given the model.call = tf.function(model.call) workaround, it may not be a breaking problem, but we should have it even so. Thanks!

lmartak commented 5 years ago

DISCLAIMER: This comment has grown bigger than I anticipated and might be separate issue material, but since it's related to this thread I'm posting it here first. Feel free to steer me with this wherever it belongs.

Thanks @axch for clarification, so my TF2.0 compatibility concern was twofold:

as TF2.0 is eager by default, I was concerned about the RealNVP (and any other Bijector that might too) not being eager-ready yet
as TF2.0 is standardizing on tf.keras, I was wondering how do tfp.bijectors.Bijectors fuse together with tf.keras.layers.Layers when it comes to keras trainability of Bijector's parameters or keras-required tensor history preservation of forward()/inverse() bijective transformations.

For example, I found that tfp.bijectors.RealNVP with tfp.bijectors.real_nvp_default_template are currently not usable with tf.keras. After some hacking, I found that I would need to have a custom class RealNVP(tfkl.Layer, tfb.ConditionalBijector) but this multi-class inheritance would result in overload of both __init__() and __call__() behaviors and I'd need to invoke specific ones in specific instances but still need them there for compatibility with the rest of tf.keras and tf.bijectors respectively. I found this not feasible (please correct me if I'm wrong and this actually leads somewhere) and ended up not being able to construct parameterized tfb.Bijector within tfkl.Layer such that aforementioned restrictions of tf.keras model would be met. My tf.keras model would compile, but have 0 trainable parameters (as reported by model.summary) and subsequently fail upon any attempt of inference (such as fit, train_on_batch or predict).

This is the code I had when gave up trying to fuse trainable Bijectors with Keras (compiles but 0 trainable params)

```python import matplotlib.pyplot as plt import numpy as np import tensorflow as tf import tensorflow_probability as tfp print(tf.__version__) # 1.13.1 tfk = tf.keras tfkl = tf.keras.layers tfpl = tfp.layers tfd = tfp.distributions tfb = tfp.bijectors from sklearn.datasets import make_moons x, y = make_moons(n_samples=2**11, noise=.05) data_dim = x.shape[-1] # 2 data_shape = (data_dim,) # (2,) m_width = 7 m_depth = 3 n_blocks = 7 prior = tfd.MultivariateNormalDiag( loc=tf.zeros(data_dim), scale_diag=tf.ones(data_dim)) cf_layers = m_depth * [m_width] class ShiftLogScaleMLP(tfkl.Layer): def __init__(self, output_dim, hidden_layers): super(ShiftLogScaleMLP, self).__init__() self.output_dim = output_dim self.hidden_layers = hidden_layers self.layers = [] def build(self, input_shape): super(ShiftLogScaleMLP, self).build(input_shape) for units in self.hidden_layers: self.layers.append(tfkl.Dense(units, activation=tf.nn.leaky_relu)) self.layers.append(tfkl.Dense(self.output_dim, activation=None)) def call(self, input): x = input for layer in self.layers: x = layer(x) shift, log_scale = tf.split(x, 2, axis=-1) return shift, log_scale class ForwardAffineCL(tfkl.Layer): def __init__(self, coupling_func): super(ForwardAffineCL, self).__init__() self.coupling_func = coupling_func def call(self, input): x0, x1 = tf.split(input, 2, axis=-1) shift, log_scale = self.coupling_func(x0) y1 = x1 * tf.exp(log_scale) + shift res = tf.concat([x0, y1], axis=-1) return res, log_scale class InverseAffineCL(tfkl.Layer): def __init__(self, coupling_func): super(InverseAffineCL, self).__init__() self.coupling_func = coupling_func def call(self, input): y0, y1 = tf.split(input, 2, axis=-1) shift, log_scale = self.coupling_func(y0) x1 = (y1 - shift) * tf.exp(-log_scale) res = tf.concat([y0, x1], axis=-1) return res, log_scale class RealNVP(tfb.Bijector): def __init__(self, forward_cl, inverse_cl, validate_args=False, name="real_nvp"): super(RealNVP, self).__init__( is_constant_jacobian=True, validate_args=validate_args, forward_min_event_ndims=0, name=name) self.forward_cl = forward_cl self.inverse_cl = inverse_cl def _forward(self, x): x, _ = self.forward_cl(x) return x def _inverse(self, y): y, _ = self.inverse_cl(y) return y def _inverse_log_det_jacobian(self, y): _, log_scale = self.inverse_cl(y) return -log_scale def _forward_log_det_jacobian(self, x): _, log_scale = self.forward_cl(x) return log_scale class BijectorLayer(tfkl.Layer): def __init__(self): super(BijectorLayer, self).__init__() def build(self, input_shape): super(BijectorLayer, self).build(input_shape) switch = tfb.Permute([1, 0]) transform = tfb.Identity() cf = ShiftLogScaleMLP(data_dim, [5, 5, 5]) forward_cl = ForwardAffineCL(cf) inverse_cl = InverseAffineCL(cf) coupling = RealNVP(forward_cl, inverse_cl) transform = coupling(transform) #transform = switch(transform) self.bijector = coupling #transform def call(self, input): return self.bijector.forward(input) def nll(x, z): ll = prior.log_prob(z) ll += transform.forward_log_det_jacobian(x, 2) ll = tf.reduce_mean(ll, axis=0) return -ll input_x = tfkl.Input(shape=data_shape) output_z = BijectorLayer()(input_x) nvp_forward = tfk.models.Model(inputs=input_x, outputs=output_z) nvp_forward.compile('adam', loss=nll, metrics=[]) nvp_forward.summary() # _________________________________________________________________ # Layer (type) Output Shape Param # # ================================================================= # input_17 (InputLayer) (None, 2) 0 # _________________________________________________________________ # bijector_layer_10 (BijectorL (None, 2) 0 # ================================================================= # Total params: 0 # Trainable params: 0 # Non-trainable params: 0 # _________________________________________________________________ nvp_forward.fit(x, x) # InvalidArgumentError: You must feed a value for placeholder # tensor 'bijector_forward_5_target' with dtype float and shape [?,?] # [[{{node bijector_forward_5_target}}]] ```

I ended up with a completely custom and from scratch tf.keras implementation of RealNVP, that would use only a single bijector tfb.Permute([1, 0]) to swap the 2 dimensions between coupling blocks.

From what I observe, Bijector.forward()/inverse() operate on raw TF tensors and should only be applied within Layer.call() to preserve keras' tensor history chain, which kinda destroys the power of chaining multiple bijectors in advance and applying the chained bijection easily, if you want some of the bijectors in the chain to have keras' trainable parameters, you have to break the chain and implement the parameterized bijections as keras layers manually.

This kinda brings me down as my TF2.0 compatibility concern 2. (stated above) turns out to be justified. Since on TF Dev Summit 2019 all these nice new features and standardizations were introduced as parts of one TF2.0 package where one would expect interoperability of those parts coming along as being native and intuitive.

To conclude, are there any plans (is there a public place to look for TFP project design/development roadmap?) to introduce tf.keras compatibility to trainable tfb.Bijectors? I'd be happy to contribute comments on design as well as code or reviews once specific roadmap is proposed. As I see it, more and more people will want to use tfp and its modules with tf.keras as it is becoming a recommended API to use with TensorFlow, even for researchers.

Thanks for any clues here!

axch commented 5 years ago

@jburnim @jvdillon Any comment here?

ppham27 commented 5 years ago

This is definitely a separate issue. I'm pretty sure the right solution here is to have both bijectors and distributions inherit from tf.Module, so variables are autotracked. Currently, variables are created with make_template functions in the v1 style.

brianwa84 commented 5 years ago

Bijectors and distributions both extend tf.Module, but I don't think Keras picks up the variable dependencies properly from modules. You might have to explicitly tell Keras about bijector.trainable_variables. there was another issue about this related to Glow recently, might look for the workaround there.

yenicelik commented 4 years ago

@brianwa84 How do I tell Keras explicitly about the bijector.trainable_variables?

giovp commented 4 years ago

For example, I found that tfp.bijectors.RealNVP with tfp.bijectors.real_nvp_default_template are currently not usable with tf.keras.

just wanted to report that this is still an issue with TF 2.1 and TFP 0.9. I found a working implementation here: https://github.com/MokkeMeguru/glow-realnvp-tutorial/blob/master/tips/RealNVP_tutorial_en.ipynb

ppham27 commented 4 years ago

Bijectors and distributions both extend tf.Module, but I don't think Keras picks up the variable dependencies properly from modules. You might have to explicitly tell Keras about bijector.trainable_variables. there was another issue about this related to Glow recently, might look for the workaround there.

It's worse than that. Template inherits from Trackable and modules don't understand Trackable.

@brianwa84 How do I tell Keras explicitly about the bijector.trainable_variables?

What I have been doing is going through the private properties and grab them and assigning them as an attribute: https://colab.research.google.com/drive/1kqE7e6RAbVZ_LpQu4Hf3YLzgZZu5Kunh

!pip install tensorflow==2.1.0
!pip install tensorflow_probability==0.9.0

import tensorflow as tf
import tensorflow_probability as tfp

from tensorflow.python.keras import backend
from tensorflow.python.keras.engine import base_layer_utils

class RealNvpLayer(tf.keras.layers.Layer):

  def __init__(self, hidden_units):
    super(RealNvpLayer, self).__init__()
    self._bijector = tfp.bijectors.RealNVP(
        fraction_masked=0.5,
        shift_and_log_scale_fn=tfp.bijectors.real_nvp_default_template(
            hidden_units))

  def build(self, input_shape):
    with backend.get_graph().as_default():
      x = base_layer_utils.generate_placeholders_from_shape(input_shape)
      _ = self._bijector(x)
    self._bijector_variables = (
        list(self._bijector.variables) +
        list(self._bijector._shift_and_log_scale_fn.variables))
    super(RealNvpLayer, self).build(input_shape)

  def call(self, x):
    return self._bijector(x)

l = RealNvpLayer([10, 10])
print(l(tf.random.normal(shape=[2, 4])))
l.variables

Not great, I know. Hopefully someone knows a better way. I believe unifying the confusion between Layer, Module, and Trackable is something being worked on at least.

tensorflow / probability

Bijectors Permute() and RealNVP() failing with tf.keras Model in TF2.0-nightly #355