Open swamidass opened 3 months ago
Hi @swamidass ,
Thank you for the report. We currently do not have a clear roadmap for full ragged support. Part of the issue is that this is specific to Tensorflow.
Note that we do have partial support.
tf.data.Dataset
, a generator or a PyDataset
should generally work.ragged=true
in the keras.Input
.For instance:
import tensorflow as tf
import keras
x1 = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])
x2 = tf.ragged.constant([[7], [8, 9, 10], [11, 12]])
model1 = keras.Sequential([keras.layers.Concatenate(axis=0)])
print(model1([x1, x2]))
print(model1.predict([x1, x2]))
input1 = keras.Input(shape=(None,))
input2 = keras.Input(shape=(None,))
output = keras.layers.Concatenate(axis=0)([input1, input2])
model2 = keras.Model(inputs=[input1, input2], outputs=output)
print(model2([x1, x2]))
print(model2.predict([x1, x2]))
Given this. I'm curious to hear what roadblocks you're hitting.
Thanks!
@hertschuh we can't specify ragged=true in the keras.Input.
but we steel NEED this in some case.
E.g. i have a custom layer called 'ToDense' which makes dense
any non-dense
(ragged or sparse) tensor.
In functional model input for this layer is KerasTensor and right now i have no way to determine if it is ragged. So i don't know should i use tf.sparse.to_dense
or .to_tensor
method.
@shkarupa-alex ,
You don't need to know whether it is ragged or sparse or dense at functional model creation time. You can simply do isinstance
, it will take the correct branch at tracing time.
class ToDense(keras.Layer):
def call(self, x):
if isinstance(x, tf.RaggedTensor):
return x.to_tensor()
else:
return keras.ops.convert_to_tensor(x, sparse=False) # this densifies sparse tensors
This will work as a workaround for now. Eventually, we want to be able to replace this with:
return keras.ops.convert_to_tensor(x, sparse=False, ragged=False)
@hertschuh , you are wrong. I need to know. Here is an example https://colab.research.google.com/drive/1oplwYOYZEiytHlClcwt3ucWeJ5M-uA46#scrollTo=Q_5CCb17RgmI
As you can see in functional model inputs are KerasTensor
which have flag "sparse" but have no flag "ragged"
@shkarupa-alex ,
Just in general:
KerasTensors
in a special waykeras.ops
so that KerasTensors
are handled transparentlytf
operations on KerasTensors
, so you need to limit tf
operations to blocks like if isinstance(x, tf.RaggedTensor)
KerasTensors
are unimportant, nothing is actually calculated, only the shape (which is why you don't need to test if they're sparse or ragged).keras.src
, this is internal only and is subject to change between versions. You should always import the public symbols: from keras import layers
.With all that, if you rewrite your code this way, it works:
import tensorflow as tf
import keras
class ToDense(keras.layers.Layer):
"""Layer that makes padding and masking a Composite Tensors effortless.
The layer takes a RaggedTensor or a SparseTensor and converts it to a
uniform tensor by right-padding it or filling in missing values.
Arguments:
pad_value: A value used to pad and fill in the missing values. Should be
a meaningless value for the input data.
mask: A Boolean value representing whether to mask the padded values.
If true, no any downstream Masking layer or Embedding layer with
mask_zero=True should be added. Default is 'False'.
Input shape: Any Ragged or Sparse Tensor is accepted, but it requires the
type of input to be specified via the Input or InputLayer from the
Keras API.
Output shape: The output is a uniform tensor having the same shape, in case
of a ragged input or the same dense shape, in case of a sparse input.
"""
def __init__(self, pad_value, mask=False, **kwargs):
super(ToDense, self).__init__(**kwargs)
self.pad_value = pad_value
self.mask = mask
def call(self, inputs):
print('call', isinstance(inputs, tf.RaggedTensor))
print('call', type(inputs))
if isinstance(inputs, tf.RaggedTensor):
outputs = inputs.to_tensor(default_value=self.pad_value)
elif isinstance(inputs, tf.SparseTensor):
outputs = tf.sparse.to_dense(inputs, default_value=self.pad_value)
else:
outputs = inputs
return outputs
def compute_mask(self, inputs, mask=None):
print('mask', isinstance(inputs, tf.RaggedTensor))
print('mask', type(inputs))
if not self.mask:
return None
if isinstance(inputs, tf.RaggedTensor):
mask = tf.ones_like(inputs.flat_values, "bool")
mask = inputs.with_flat_values(mask)
mask = mask.to_tensor(False)
elif isinstance(inputs, tf.SparseTensor):
mask = tf.ones_like(inputs.values, "bool")
mask = inputs.with_values(mask)
mask = tf.sparse.to_dense(mask, default_value=False)
else:
mask = keras.ops.ones_like(inputs, "bool")
mask = keras.ops.any(mask, axis=-1)
return mask
def compute_output_shape(self, input_shape):
return input_shape
def get_config(self):
config = super(ToDense, self).get_config()
config.update({"pad_value": self.pad_value, "mask": self.mask})
return config
@hertschuh all you talking is good in theory. But in practice...
you should never need to handle KerasTensors in a special way you should never need to test if they're sparse, ragged etc. instead, you should use keras.ops so that KerasTensors are handled transparently you cannot use tf operations on KerasTensors, so you need to limit tf operations to blocks like if isinstance(x, tf.RaggedTensor)
But if i don't want to support other backends right now (usually because keras.ops miss required ops e.g. https://github.com/keras-team/keras/issues/20046) it should be possible to write layer for single (TF) backend.
What should all we do if our models written and trained for keras V2 used RaggedTensors?
actual operations on KerasTensors are unimportant, nothing is actually calculated, only the shape (which is why you don't need to test if they're sparse or ragged).
But we need to choose right op...
you should never import from keras.src, this is internal only and is subject to change between versions. You should always import the public symbols: from keras import layers.
As i wrote here https://github.com/keras-team/tf-keras/issues/202 current public api is not perfect. I'm tired of importing some things from public api and other from private. I just gave up and always import from keras.src im my libraries.
All i want to say in this issue called "Roadmap for RaggedTensor": WE NEED RAGGED TENSORS.
I understand that it may be a long way, but right now it will be enough to revert back "ragged=True/False" property into Input
layer. After that people like me can continue to write and use our ugly, backend[TF]-dependant custom layers.
it should be possible to write layer for single (TF) backend.
Yes, you can. In the code above, you can replace the keras.ops
with the TF ones.
But we need to choose right op...
KerasTensor
is not the mechanism to chose the right op. In my example above, KerasTensor
s are never used. And ragged=True
doesn't actually do anything. Only ragged=False
does something, it tells Keras 2 to densify ragged tensors.
What should all we do if our models written and trained for keras V2 used RaggedTensors?
Taking a step back. Why are you trying to migrate to keras V3? (if you're not trying to support multiple backends)
Taking a step back. Why are you trying to migrate to keras V3? (if you're not trying to support multiple backends)
There are 3 reasons:
Keras 3.0 will be the default Keras version. You may need to update your script to use Keras 3.0.
So, the pipeline that works fine in Keras 2:
I don't know how to do this in Keras 3. Is there a way I am missing?
There are several new errors that arise in this pipeline in Keras 3:
Keras v3 does not work with ragged inputs. What is the roadmap for including this feature?
Per: #18414