tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.83k stars 274 forks source link

Tensorboard error when using quickstart model #94

Open tszumowski opened 4 years ago

tszumowski commented 4 years ago

I was able to get the Quickstart model running seamlessly.

However, I then tried to add a Tensorboard callback, by following the Tensorboard Quickstart.

In the tfrs API docs, it says:

Note that this base class is a thin conveniece wrapper for tf.keras.Model.

Since it is a Keras model, I followed this section: Using TensorBoard with Keras Model.fit()

This means I replaced this line in Quickstart:

model.fit(ratings.batch(4096), epochs=3)

with these lines:

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

model.fit(ratings.batch(4096), epochs=3, callbacks=[tensorboard_callback])

(in other words, adding the callbacks= input argument`)

When I run this, I get the following error:

2020-10-01 21:00:11.235419: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 0 callback api events and 0 activity events. 
2020-10-01 21:00:11.543177: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11
2020-10-01 21:00:11.631104: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for trace.json.gz to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.trace.json.gz
2020-10-01 21:00:11.733572: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11
2020-10-01 21:00:11.736341: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for memory_profile.json.gz to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.memory_profile.json.gz
2020-10-01 21:00:11.737640: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11Dumped tool data for xplane.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.xplane.pb
Dumped tool data for overview_page.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.kernel_stats.pb

 2/25 [=>............................] - ETA: 9s - factorized_top_k: 7.5684e-04 - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 3.6621e-04 - factorized_top_k/top_100_categorical_accuracy: 0.0034 - loss: 34084.4688 - regularization_loss: 0.0000e+00 - total_loss: 34084.4688    WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2553s vs `on_train_batch_end` time: 0.6103s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2553s vs `on_train_batch_end` time: 0.6103s). Check your callbacks.
25/25 [==============================] - ETA: 0s - factorized_top_k: 0.0293 - factorized_top_k/top_1_categorical_accuracy: 6.0000e-05 - factorized_top_k/top_5_categorical_accuracy: 0.0014 - factorized_top_k/top_10_categorical_accuracy: 0.0047 - factorized_top_k/top_50_categorical_accuracy: 0.0424 - factorized_top_k/top_100_categorical_accuracy: 0.0978 - loss: 33915.1966 - regularization_loss: 0.0000e+00 - total_loss: 33915.1966Traceback (most recent call last):
  File "tfrs_quickstart.py", line 100, in <module>
    model.fit(ratings.batch(4096), epochs=3, callbacks=[tensorboard_callback])
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2182, in on_epoch_end
    self._log_weights(epoch)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2233, in _log_weights
    weight_name = weight.name.replace(':', '_')
AttributeError: 'TrackableWeightHandler' object has no attribute 'name'

Should this work as-is with .fit()? Or do I need to make a custom implementation (as described at the top here)?

I did notice the tfrs.models.Model uses tf.GradientTape(), and the Tensorboard docs have different directions for trainers that use that method.

I also attached the .py file representation of the quickstart for reference (where the lines are added from above)

tfrs_quickstart.py.txt

tszumowski commented 4 years ago

Update

When I remove histogram_freq, everything works fine.

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)

I do not currently have a need to view weight/activation histograms. So that is fine for me, and I'm happy to close this ticket. Though if you'd like to leave this issue open to address otherwise, it can remain open.

maciejkula commented 4 years ago

Thanks for the debugging! We should probably look into adding a battery of tests that include TensorBoard integration.

windmaple commented 3 years ago

TB is working fine now.