Open ezyang opened 7 years ago
Is there something wrong with Scalar(0).toTensor().expand(the_size)
Hmm, that does seem like it should work :)
The possible caveat with that is that some operations require contiguous tensors to work with (BLAS, cudnn) and so extra .clone()
might be needed in some places.
Also, for some functions, it is interesting to know that some gradients are full of zeros so that we can avoid doing some operations during the backward pass.
also 0-strided tensors don't work in inplace operations.
To elaborate on @gchanan's comment, if you have x += y
(as frequently occurs during gradient calculation), the semantics are different if x
is 0-dim or n-dim zero-filled tensor. 0-dim inplace addition will be rejected unless y is also 0-dim, but an n-dim zero-filled tensor will work in all cases.
is it actually rejected? I seem to remember it just gives you the wrong answer.
You're right, it's not rejected.
>>> x = torch.Tensor([0]).expand(2, 2)
>>> x += torch.Tensor([[1,2],[3,4]])
>>> x
10 10
10 10
[torch.FloatTensor of size 2x2]
We need a way to efficiently represent a tensor with a concrete type and shape, but which is filled with all zeros, without having to actually materialize the tensor in question. The primary motivation for this is in backwards computation, when not all outputs have gradients. In this case, at the point we invoke the backwards, some of the
grad_outputs
do not exist; however, the backwards itself still needs these outputs as inputs. Today, you have two options:Pass in
undefined
. The problem is that undefined tensors don't support any operations and don't carry type or shape information, so now the backwards code has to know how to deal with a combinatorial set of undefined/defined input permutations, and if it really does need to know the type/shape info of the backwards, that information needs to be stored via a side channel.Pass in a zero-filled tensor. This is morally the correct thing to do mathematically speaking. The problem is now you're materializing possibly giant zero-filled tensors which you are not actually going to use in any useful way.
Today, we take a hybrid approach in PyTorch:
It is not entirely clear what a good, simple implementation strategy for zero-filled tensors in ATen is, as it combinatorially increases the number of combinations of inputs implementations need to support.
CC @zdevito @gchanan @colesbury