scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
819 stars 85 forks source link

New to/from_raggedtensor and to/from_nestedtensor functions #1466

Open jpivarski opened 2 years ago

jpivarski commented 2 years ago

Description of new feature

TensorFlow's RaggedTensor has been around for a while and PyTorch's NestedTensor is new, but they should be conversion targets for Awkward Arrays. And then a tutorial should be written in the spirit of this one:

https://www.tensorflow.org/text/tutorials/text_generation

but using Awkward strings ↔ RaggedTensors instead of making np.array(["...", "...", ...], dtype=object). It's a great example of why that interface would be useful, particularly when paired with ak.str.* functions. (We can demonstrate the difference with a scaling test on some enormous dataset. What's an enormous text dataset? I wonder...)

I believe that this is an equivalent for PyTorch's NestedTensor:

https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html

I don't know if JAX has a ragged array interface, but if it does, let's get that, too. All of these converters are relatively easy, but we'd have to get TensorFlow, PyTorch, and JAX into at least one of the CI tests. TensorFlow and JAX can be hard to install.

jpivarski commented 1 year ago

FYI @ioanaif, @kpedro88 is interested in this feature.

jpivarski commented 5 months ago

Here's another instance of someone wanting this feature, on StackOverflow.