Open rodrigob opened 2 years ago
Yes, however, the ragged_tensor.numpy()
remove information on the existing ragged tensor. Depending on your use-case might, it cam make it more complicated to use. Because you would now have to manually parse the nested list.
For example, accessing flat_values
or row_limits
is not possible anymore on ragged_tensor.numpy()
, so keeping tf
allow things like:
ragged_tensor.flat_values.numpy()
ragged_tensor.row_limits().numpy()
The proper solution would be to have some custom numpy ragged tensor support, but would require more important engineering efforts.
Another option would be to have an argument in tfds.as_numpy() that indicates how to treat RaggedTensors (with default to "return as-is").
Without it I resorted to implementing my own versions of:
def _eager_dataset_iterator(ds: tf.data.Dataset) -> t.Iterator[NumpyElem]
def _elem_to_numpy_eager(tf_el: TensorflowElem) -> t.Union[NumpyElem, t.Iterable[NumpyElem]]
def tfds_as_numpy(dataset: tf.data.Dataset) -> tf.data.Dataset
where _elem_to_numpy_eager includes
elif isinstance(tf_el, tf.RaggedTensor):
return tf_el.numpy()
As long as tfds.core.dataset_utils does not change much, this will work fine for my use case; but feels like slightly abusing the library.
Also note you can also use ds.as_numpy_iterator()
which should directly returns tf.RaggedTensor
as numpy.
as_numpy_iterator
has some difference with tfds.as_numpy
though. Like not supporting None
, len(ds)
,... but it's been a while I haven't tried, so those might have been fixed.
Indeed that seems to fit better my use case. Then I suggest that tfds.as_numpy(...) docstring should mention something along the lines of "for eager mode only use cases, ds.as_numpy_iterator(...) might be a better fit." (specially for use cases with RaggedTensor)
Hello, I was recently bitten by tfds.as_numpy(...) handling of RaggedTensors,
however the ragged tensor documentation indicates
Ragged tensors can be converted to nested Python lists and NumPy arrays: [...] digits.numpy()
.It seems then that
_elem_to_numpy_eager(...)
would need updating.