[BUG] We don't have tests for VAE #1694

Open miguelgfierro opened 2 years ago

miguelgfierro commented 2 years ago


FYI @anargyri @pradnyeshjoshi

miguelgfierro commented 2 years ago

The notebook has multiple errors:

>           raise error
E           papermill.exceptions.PapermillExecutionError:
E           ---------------------------------------------------------------------------
E           Exception encountered at "In [2]":
E           ---------------------------------------------------------------------------
E           TypeError                                 Traceback (most recent call last)
E           /tmp/ipykernel_27602/ in <module>
E                14 # temporary Path to save the optimal model's weights
E                15 tmp_dir = TemporaryDirectory()
E           ---> 16 WEIGHTS_PATH = os.path.join(tmp_dir, "svae_weights.hdf5")
E                17
E                18 SEED = 98765
E           /anaconda/envs/reco_gpu/lib/python3.7/ in join(a, *p)
E                78     will be discarded.  An empty last part will result in a path that
E                79     ends with a separator."""
E           ---> 80     a = os.fspath(a)
E                81     sep = _get_sep(a)
E                82     path = a
E           TypeError: expected str, bytes or os.PathLike object, not TemporaryDirectory

/anaconda/envs/reco_gpu/lib/python3.7/site-packages/papermill/ PapermillExecutionError

the weights are not downloaded... We need to rerun the notebook with the new version of TF...

miguelgfierro commented 2 years ago

Tensorflow version:

$ pip list | grep tensorflow
tensorflow                   2.7.1
tensorflow-estimator         2.7.0
tensorflow-io-gcs-filesystem 0.24.0

Execute the notebook via papermill:

$ git checkout miguel/missing
$ pytest tests/integration/examples/

test_standard_vae_deep_dive_integration
_________________________________________________________ test_standard_vae_deep_dive_integration[1m-100-expected_values0] __________________________________________________________

notebooks = {'als_deep_dive': '/home/hoaphumanoid/notebooks/recommenders/examples/02_model_collaborative_filtering/als_deep_dive.i.../home/hoaphumanoid/notebooks/recommenders/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb', ...}
output_notebook = 'output.ipynb', kernel_name = 'python3', size = '1m', epochs = 100
expected_values = {'eval_map_2': 0.138111, 'eval_map_4': 0.171624, 'eval_ndcg_2': 0.392379, 'eval_ndcg_4': 0.443328, ...}

        "size, epochs, expected_values",
    def test_standard_vae_deep_dive_integration(
        notebooks, output_notebook, kernel_name, size, epochs, expected_values
        notebook_path = notebooks["standard_vae_deep_dive"]
>           parameters=dict(MOVIELENS_DATA_SIZE=size, EPOCHS=epochs),

E           papermill.exceptions.PapermillExecutionError: 
E           ---------------------------------------------------------------------------
E           Exception encountered at "In [25]":
E           ---------------------------------------------------------------------------
E           TypeError                                 Traceback (most recent call last)
E           /tmp/ipykernel_21487/ in <module>
E                 4                              x_val_tr=val_data_tr,
E                 5                              x_val_te=val_data_te_ratings, # with the original ratings
E           ----> 6                              mapper=am_val
E                 7                              )
E                 8 print("Took {} seconds for training.".format(t))
E           ~/notebooks/recommenders/recommenders/models/vae/ in fit(self, x_train, x_valid, x_val_tr, x_val_te, mapper)
E               406                 verbose=self.verbose,
E               407                 callbacks=[metrics, history, self.reduce_lr],
E           --> 408                 validation_data=(x_valid, x_valid),
E               409             )
E               410 
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/ in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
E              2028         use_multiprocessing=use_multiprocessing,
E              2029         shuffle=shuffle,
E           -> 2030         initial_epoch=initial_epoch)
E              2031 
E              2032   @doc_controls.do_not_generate_docs
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/utils/ in error_handler(*args, **kwargs)
E                65     except Exception as e:  # pylint: disable=broad-except
E                66       filtered_tb = _process_traceback_frames(e.__traceback__)
E           ---> 67       raise e.with_traceback(filtered_tb) from None
E                68     finally:
E                69       del filtered_tb
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/framework/ in autograph_handler(*args, **kwargs)
E              1127           except Exception as e:  # pylint:disable=broad-except
E              1128             if hasattr(e, "ag_error_metadata"):
E           -> 1129               raise e.ag_error_metadata.to_exception(e)
E              1130             else:
E              1131               raise
E           TypeError: in user code:
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 878, in train_function  *
E                   return step_function(self, iterator)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 867, in step_function  **
E                   outputs =, args=(data,))
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 860, in run_step  **
E                   outputs = model.train_step(data)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 810, in train_step
E                   y, y_pred, sample_weight, regularization_losses=self.losses)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 240, in __call__
E                   total_loss_metric_value, sample_weight=batch_dim)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/utils/", line 73, in decorated
E                   update_op = update_state_fn(*args, **kwargs)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/", line 177, in update_state_fn
E                   return ag_update_state(*args, **kwargs)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/", line 452, in update_state  **
E                   sample_weight, values)
E               File "/anaconda/envs/reco_gpu/lib/python3.7/site-packages/keras/engine/", line 256, in __array__
E                   f'You are passing {self}, an intermediate Keras symbolic input/output, '
TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_4'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.
miguelgfierro commented 2 years ago

Trying with the original TF version that the notebook was written in:

$ pip list | grep tensorflow
tensorflow                   2.2.0rc1
tensorflow-estimator         2.2.0
tensorflow-io-gcs-filesystem 0.24.0

$ pip list | grep Keras
Keras                        2.3.1
Keras-Applications           1.0.8
Keras-Preprocessing          1.1.2

Execute the notebook via papermill:

$ git checkout miguel/missing
$ pytest tests/integration/examples/

/anaconda/envs/reco_gpu/lib/python3.7/site-packages/papermill/ in execute_notebook
    raise_for_execution_errors(nb, output_path)
E           papermill.exceptions.PapermillExecutionError: 
E           ---------------------------------------------------------------------------
E           Exception encountered at "In [25]":
E           ---------------------------------------------------------------------------
E           TypeError                                 Traceback (most recent call last)
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
E                59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
E           ---> 60                                         inputs, attrs, num_outputs)
E                61   except core._NotOkStatusException as e:
E           TypeError: An op outside of the function building code is being passed
E           a "Graph" tensor. It is possible to have Graph tensors
E           leak out of the function building context by including a
E           tf.init_scope in your function building code.
E           For example, the following function will fail:
E             @tf.function
E             def has_init_scope():
E               my_constant = tf.constant(1.)
E               with tf.init_scope():
E                 added = my_constant * 2
E           The graph tensor has name: dense_2/Identity:0
E           During handling of the above exception, another exception occurred:
E           _SymbolicException                        Traceback (most recent call last)
E           /tmp/ipykernel_3741/ in <module>
E                 4                              x_val_tr=val_data_tr,
E                 5                              x_val_te=val_data_te_ratings, # with the original ratings
E           ----> 6                              mapper=am_val
E                 7                              )
E                 8 print("Took {} seconds for training.".format(t))
E           ~/notebooks/recommenders/recommenders/models/vae/ in fit(self, x_train, x_valid, x_val_tr, x_val_te, mapper)
E               406                 verbose=self.verbose,
E               407                 callbacks=[metrics, history, self.reduce_lr],
E           --> 408                 validation_data=(x_valid, x_valid),
E               409             )
E               410 
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/util/ in new_func(*args, **kwargs)
E               322               'in a future version' if date is None else ('after %s' % date),
E               323               instructions)
E           --> 324       return func(*args, **kwargs)
E               325     return tf_decorator.make_decorator(
E               326         func, new_func, 'deprecated',
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/ in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
E              1412         use_multiprocessing=use_multiprocessing,
E              1413         shuffle=shuffle,
E           -> 1414         initial_epoch=initial_epoch)
E              1415 
E              1416   @deprecation.deprecated(
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/ in _method_wrapper(self, *args, **kwargs)
E                63   def _method_wrapper(self, *args, **kwargs):
E                64     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
E           ---> 65       return method(self, *args, **kwargs)
E                66 
E                67     # Running inside `run_distribute_coordinator` already.
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/ in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
E               781                 batch_size=batch_size):
E               782               callbacks.on_train_batch_begin(step)
E           --> 783               tmp_logs = train_function(iterator)
E               784               # Catch OutOfRangeError for Datasets of unknown size.
E               785               # This blocks until the batch has finished executing.
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in __call__(self, *args, **kwds)
E               578         xla_context.Exit()
E               579     else:
E           --> 580       result = self._call(*args, **kwds)
E               581 
E               582     if tracing_count == self._get_tracing_count():
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in _call(self, *args, **kwds)
E               642         # Lifting succeeded, so variables are initialized and we can run the
E               643         # stateless function.
E           --> 644         return self._stateless_fn(*args, **kwds)
E               645     else:
E               646       canon_args, canon_kwds = \
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in __call__(self, *args, **kwargs)
E              2418     with self._lock:
E              2419       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
E           -> 2420     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
E              2421 
E              2422   @property
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in _filtered_call(self, args, kwargs)
E              1663          if isinstance(t, (ops.Tensor,
E              1664                            resource_variable_ops.BaseResourceVariable))),
E           -> 1665         self.captured_inputs)
E              1666 
E              1667   def _call_flat(self, args, captured_inputs, cancellation_manager=None):
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in _call_flat(self, args, captured_inputs, cancellation_manager)
E              1744       # No tape is watching; skip to running the function.
E              1745       return self._build_call_outputs(
E           -> 1746           ctx, args, cancellation_manager=cancellation_manager))
E              1747     forward_backward = self._select_forward_and_backward_functions(
E              1748         args,
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in call(self, ctx, args, cancellation_manager)
E               596               inputs=args,
E               597               attrs=attrs,
E           --> 598               ctx=ctx)
E               599         else:
E               600           outputs = execute.execute_with_cancellation(
E           /anaconda/envs/reco_gpu/lib/python3.7/site-packages/tensorflow/python/eager/ in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
E                72       raise core._SymbolicException(
E                73           "Inputs to eager execution function cannot be Keras symbolic "
E           ---> 74           "tensors, but found {}".format(keras_symbolic_tensors))
E                75     raise e
E                76   # pylint: enable=protected-access
_SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'dense_2/Identity:0' shape=(None, 70) dtype=float32>, <tf.Tensor 'dense_1/Identity:0' shape=(None, 70) dtype=float32>]

/anaconda/envs/reco_gpu/lib/python3.7/site-packages/papermill/ PapermillExecutionError
miguelgfierro commented 2 years ago

From Andreas: One thing I notice in this code is that they use methods from keras.backend at several places. These methods are not available in the new API but they are in the old one

So, one thing to try is to replace these references with something using tf.compat.v1.keras.backend

miguelgfierro commented 1 year ago

Related issue: