Open alykhantejani opened 3 months ago
The same to me!
When I load trained model from disk for incremental training, it will failed when fit(train_dataset)
I load model by model = tf.keras.models.load_model(FLAGS.model_dir)
the error log is
Traceback (most recent call last):
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 247, in <module>
app.run(main)
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 237, in main
train()
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 147, in train
model.fit(dataset, epochs=FLAGS.epochs, steps_per_epoch=FLAGS.steps_per_epoch)
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node Adam/ResourceScatterAdd_3 defined at (most recent call last):
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 247, in <module>
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/absl/app.py", line 308, in run
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 237, in main
File "/apdcephfs/dd_model/recommenders-addons-0.7.2/demo/dynamic_embedding/movielens-1m-keras/movielens-1m-keras.py", line 147, in train
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/engine/training.py", line 1807, in fit
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/engine/training.py", line 1401, in train_function
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/engine/training.py", line 1384, in step_function
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/engine/training.py", line 1373, in run_step
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/engine/training.py", line 1154, in train_step
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 544, in minimize
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 1223, in apply_gradients
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 652, in apply_gradients
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 1253, in _internal_apply_gradients
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 1345, in _distributed_apply_gradients_fn
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 1342, in apply_grad_to_update_var
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py", line 241, in _update_step
File "/root/miniconda3/envs/py39tfra072/lib/python3.9/site-packages/keras/src/optimizers/adam.py", line 185, in update_step
indices[0] = 0 is not in [0, 0)
[[{{node Adam/ResourceScatterAdd_3}}]] [Op:__inference_train_function_3810]
2024-08-08 16:21:52.222686: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2024-08-08 16:21:52.232673: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Sorry, TFRA is hard to support tf.keras.models.load_model API. Because load_model will create trainable variable object from TensorFlow, but TFRA trainable wrapper is not in TF code.
Hi,
I am training a model with dynamic embeddings (specifically HvdAllToAllEmbeddings). I am saving the model to disk with
de.keras.models.de_save_model
and I see that it appears my dynamic embedding variables are saved to disk.However, when restoring from this directory it appears only the dense weights get restored. I am restoring with
model.load_weights(FLAGS.model_dir)
as shown hereAm I supposed to restore a KVCreator too?