In preproc we often wants to operates over variable-width list, such as token ids in text domain, or sparse features in recommendation domain; one common operation is to slice over each list (e.g. only need first k elements). One way is to use Arrow's List type:
I was thinking nested tensor may also work well for this use case (especially when doing preproc after Tensor collate). But looks like slice is not yet supported on ragged dimension?
>>> import torch
>>> a, b, c = torch.arange(4), torch.arange(5) + 4, torch.arange(2) + 8
>>> id_list = torch.nested_tensor([a, b, c])
>>> id_list
nested_tensor([
tensor([0, 1, 2, 3]),
tensor([4, 5, 6, 7, 8]),
tensor([8, 9])
])
>>> id_list[:, :3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor
Wondering if there is any plan to support this? Thanks!
🚀 Feature
Support slice operation on "ragged" dimension
Motivation
In preproc we often wants to operates over variable-width list, such as token ids in text domain, or sparse features in recommendation domain; one common operation is to slice over each list (e.g. only need first k elements). One way is to use Arrow's List type:
I was thinking nested tensor may also work well for this use case (especially when doing preproc after Tensor collate). But looks like slice is not yet supported on ragged dimension?
Wondering if there is any plan to support this? Thanks!