Open foxik opened 2 years ago
Adding @pedro-r-marques who wrote the code.
On second thought, I opened an issue in the TensorFlow repository https://github.com/tensorflow/tensorflow/issues/55475 to discuss the problem with tf.map_fn
on RaggedTensors
-- the RaggedTensor
s are supported according to the documentation, so this is in fact a bug.
However, I think we could still discuss whether it would make sense to use .flat_values
instead (maybe I am mistaken and it cannot be done easily; but I have implemented various models with RaggedTensor
s and using the .flat_values
worked for me for loss computations in all of them).
@gadagashwini , I was able to reproduce the issue in tf v2.7, v2.8 and nightly.Please find the gist of it here.
@foxik I see a similar issue here - https://github.com/tensorflow/tensorflow/issues/46635 according to @JXRiver looks like - "According to @edloper "Basically, RaggedTensorVariant objects should never be copied to GPU, because we can't do anything useful with them there. But Placer isn't currently smart enough to figure that out (it just sees a Variant tensor, and doesn't know what kind of value it contains)." We have a project going on right now that hopefully will fix the issue"
@divyashreepathihalli Thanks for pointing it out -- I have closed my report in TensorFlow repository as a duplicate of it.
This also means we need to go through the action 2. and not use a map_fn
on RaggedTensors in the loss calculations. I will see if I will be able to come up with a fix.
I've just ran into this issue in hosted Colab and its default GPU runtime. I developed and trained a baby model on CPU at home in a Google's container image, switched to GPU in Colab to train a larger one, and kaboom. The model is built all around ragged tensors to avoid carrying and debugging masks in a mix of out-of-the box and custom layers. The affected loss is losses.CategoricalCrossentropy
; I temporarily commented out all regularization losses.
A question, if you don't mind: is this problem affecting only lossess/gradients? I will write my own loss anyway; the one-hot xent is just a temporary stand-in for what I'm ultimately going to achieve. I can do dense tensors with masks in the weight only: that's a small piece of code compared to the whole caboodle. I use Keras niceties tho, like early stopping, LR scheduling and, most helpful, ReduceLROnPlateau
, and am unsure if I can use them in a custom training loop, should it come to this as a workaround. Budu velmi vdĕcen za radu! :-)
In the unlikely case it makes any difference, the loss sits in the middle of the model (autoencoder-style with teacher-forcing time delay). I've it added to the functional model, and am tracing it with an explicit call, like
caxent_loss = losses.CategoricalCrossentropy(name='temp_caxent_loss_ly')(
y_pred=teach_1h_pred_t,
y_true=teach_1h_true_t)
train_time_model = keras.Model(...)
train_time_model.add_loss(caxent_loss)
train_time_model.compile(
optimizer=keras.optimizers.Adam(...), ...)
train_time_model.fit(...)
Both backtraces point to the same place, likely the same exact one, so I'm copying the last few of only one:
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/usr/local/lib/python3.8/dist-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 526, in minimize
grads_and_vars = self.compute_gradients(loss, var_list, tape)
File "/usr/local/lib/python3.8/dist-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 259, in compute_gradients
grads = tape.gradient(loss, var_list)
Node: 'zeros_like_2'
2 root error(s) found.
(0) INTERNAL: No unary variant unary_op function found for op ZEROS_LIKE Variant type_name: RaggedTensorVariant for device type: GPU
[[{{node zeros_like_2}}]]
[[Func/train_time_model/tf.keras.metrics.categorical_crossentropy/map/while/body/_21/input/_357/_210]]
(1) INTERNAL: No unary variant unary_op function found for op ZEROS_LIKE Variant type_name: RaggedTensorVariant for device type: GPU
[[{{node zeros_like_2}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_9351]
A boring and probably unhelpful version list that I'm always printing anyway:
------------------ ----------------------------------------------------
TF Version........ 2.11.0
Keras version..... 2.11.0
Physical devices.. ['/physical_device:CPU:0', '/physical_device:GPU:0']
TF execute mode... EAGER
matplotlib........ 3.2.2
numpy............. 1.21.6
IPython kernel.... 7.9.0
Jupyter client.... 6.1.12
Debian version.... bullseye/sid
Linux version..... 5.10.147+ keras-team/keras#1 SMP Sat Dec 10 16:00:40 UTC 2022
------------------ ----------------------------------------------------
@divyashreepathihalli, this issue is possibly marked as a Duplicate in the context of Keras incorrectly. @foxik's issue in the TF repo was a duplicate of another one there, but the fix, according to your own quotation (https://github.com/keras-team/tf-keras/issues/638), looks easier on the Keras side. A full-blown, thorough handling of RaggedTensorVariant
s on GPU by TF seems quite a substantial work, and looks unlikely to come soon.
Any progress on this issue?
System information.
Describe the problem.
When some loss (
tf.losses.SparseCategoricalCrossentropy
,tf.losses.CategoricalCrossentropy
,tf.losses.BinaryCrossentropy
, ortf.losses.MeanSquaredError
) is used on Ragged tensors, that the gradient computation on a GPU crashes withDescribe the current behavior.
The code crashes on a GPU. It does not crash on a CPU and it does not crash when
tf.function
s are executed eagerly.Describe the expected behavior.
The code should not crash.
Standalone code to reproduce the issue.
A simple Colab reproducing the error is here: https://colab.research.google.com/drive/1OELAhvpQHhaz3sOYabf4SdBqKlQCjNjs?usp=sharing
Source code / logs.
The problem is somehow connected to the usage of ragged map in here: https://github.com/keras-team/keras/blob/2db5acf3e3c5904b014cb409d3c514bef44f9640/keras/losses.py#L1408 . My guess is that a TensorArray of ragged arrays is created and some operation for manipulating it on GPU is missing.
loss=lambda yt, yp: tf.losses.BinaryCrossentropy()(yt. values, yp.values)
, the problem does not appear and the computation works.Note that metrics with ragged tensors work fine; but they take a different approach, and instead of a ragged map, they use
flat_values
, see https://github.com/keras-team/keras/blob/2db5acf3e3c5904b014cb409d3c514bef44f9640/keras/utils/metrics_utils.py#L800 .Possible courses of action
Personally I like 2. more, because the problem at hand can be fixed by a "simple" solution.