Support for nesting of nested tensors and view operation

saitcakmak commented 3 years ago

🚀 Feature

This is two related features packed in one.

The first one is to support nesting of nested tensors, i.e., this operation should succeed:

list_of_nts = [nested_tensor_1, nested_tensor_2, ...]
nested_nested_tensor = nestedtensor.nested_tensor(list_of_nts)

This currently leads to RuntimeError: Currently only supporting a sequence of Tensors..

The second feature is to support a view operation over the nested structure:

list_of_tensors = [tensor_1, tensor_2, ..., tensor_6]
nested_tensor = nestedtensor.nested_tensor(list_of_tensors)
view_nt = nested_tensor.view(2, 3)

This should produce the same output as

nt1 = nestedtensor.nested_tensor(list_of_tensors[:3])
nt2 = nestedtensor.nested_tensor(list_of_tensors[3:])
equivalent_nt = nestedtensor.nested_tensor([nt1, nt2])

Motivation

I have a set-valued function that returns a different sized output depending on the given input. So, for n x m-dim input, it returns a k x m-dim tensor where k < n depends on the particular input. I need to evaluate this function for batch_1 x batch_2 x n x m-dim input, which I do in a loop after collapsing the batches together, i.e., input.view(-1, n, m). This produces a list of batch_1 * batch_2 different sized outputs. Ideally, I need to return a batch_1 x batch_2 x k x m-dim tensor, which doesn't work since k is not fixed. So, I instead want to put the (batch_1 * batch_2) list of varying k x m-dim tensors into a nestedtensor and return a view of it that agrees with the original batch structure, so batch_1 x batch_2 x varying k x 2. These features would make this possible.

PS: I will also need to use autograd over each k x m-dim output after joining them with other tensors. Since autograd support seems like an existing concern, I'll skip that here.

Pitch

See above.

Alternatives

I can use a list of tensors and do the operations manually (in a loop) but that's not ideal.

Additional context

N/A

cpuhrsch commented 3 years ago

Hello @saitcakmak,

Thanks for writing the issue! For now higher levels of nesting (i.e. higher nested dimensions) are disabled to allow us to focus on writing out better and more performant kernels and simpler code. This is a constraint we can remove later on, but currently needed for ease of development and based on current priorities.

The view operation you have in mind would have to be given a valid nested size, so nested_tensor.view(NestedSize([[t.size() for t in tensors[:3]], [t.sizes() for t in tensors[3:]]])) if I understand you correctly. I think what you have in mind might be more easily achieved by a select operation, so perhaps torch.stack(nested_tensor[:3], nested_tensor[3:]), but as you pointed out this needs multiple levels of nesting.

Aside from this, maybe NestedTensor could still be of use if you're looking to improve the evaluation performance of your model?

Thanks, Christian

saitcakmak commented 3 years ago

Hi @cpuhrsch,

Thanks for the response!

For now higher levels of nesting (i.e. higher nested dimensions) are disabled to allow us to focus on writing out better and more performant kernels and simpler code.

Yeah, that sounds very reasonable. These can be added to the bottom of a long TODO list :)

Aside from this, maybe NestedTensor could still be of use if you're looking to improve the evaluation performance of your model?

This could be interesting, I haven't thought of that. The current implementation converts some inputs to python lists, which surprisingly speeds things up and requires much less memory compared to a to tensor based implementation. It also doesn't support batched evaluation due to the shapes of variables changing based on the input. NestedTensor could be helpful there. I'll think about it!

pytorch / nestedtensor