Efficient zero-filled tensors

ezyang commented 7 years ago

We need a way to efficiently represent a tensor with a concrete type and shape, but which is filled with all zeros, without having to actually materialize the tensor in question. The primary motivation for this is in backwards computation, when not all outputs have gradients. In this case, at the point we invoke the backwards, some of the grad_outputs do not exist; however, the backwards itself still needs these outputs as inputs. Today, you have two options:

Pass in undefined. The problem is that undefined tensors don't support any operations and don't carry type or shape information, so now the backwards code has to know how to deal with a combinatorial set of undefined/defined input permutations, and if it really does need to know the type/shape info of the backwards, that information needs to be stored via a side channel.
Pass in a zero-filled tensor. This is morally the correct thing to do mathematically speaking. The problem is now you're materializing possibly giant zero-filled tensors which you are not actually going to use in any useful way.

Today, we take a hybrid approach in PyTorch:

For Python autograd functions, we materialize and pass in zero-filled tensors (2)
For C++ autograd functions, we pass in undefined tensors. The C++ code is expected to handle if the inputs are undefined. This results in awful code, see for example: https://github.com/pytorch/pytorch/blob/master/tools/autograd/templates/Functions.cpp#L264-L272

It is not entirely clear what a good, simple implementation strategy for zero-filled tensors in ATen is, as it combinatorially increases the number of combinations of inputs implementations need to support.

CC @zdevito @gchanan @colesbury

zdevito commented 7 years ago

Is there something wrong with Scalar(0).toTensor().expand(the_size)

ezyang commented 7 years ago

Hmm, that does seem like it should work :)

albanD commented 7 years ago

The possible caveat with that is that some operations require contiguous tensors to work with (BLAS, cudnn) and so extra .clone() might be needed in some places. Also, for some functions, it is interesting to know that some gradients are full of zeros so that we can avoid doing some operations during the backward pass.

gchanan commented 7 years ago

also 0-strided tensors don't work in inplace operations.

ezyang commented 7 years ago

To elaborate on @gchanan's comment, if you have x += y (as frequently occurs during gradient calculation), the semantics are different if x is 0-dim or n-dim zero-filled tensor. 0-dim inplace addition will be rejected unless y is also 0-dim, but an n-dim zero-filled tensor will work in all cases.

gchanan commented 7 years ago

is it actually rejected? I seem to remember it just gives you the wrong answer.

ezyang commented 7 years ago

You're right, it's not rejected.

>>> x = torch.Tensor([0]).expand(2, 2)
>>> x += torch.Tensor([[1,2],[3,4]])
>>> x

 10  10
 10  10
[torch.FloatTensor of size 2x2]

zdevito / ATen

Efficient zero-filled tensors #163