SizeValue generic support for dynamic sizes

vieting commented 2 years ago

For fixing #84, as demonstrated in #85, we need to save more information in SizeValue.

vieting commented 2 years ago

I added the dim tags now. The corresponding layer is still missing, we need that as well, right?

albertz commented 2 years ago

I added the dim tags now. The corresponding layer is still missing, we need that as well, right?

What layer do you mean?

Or you mean the changes to reshape or more specifically the Unflatten module?

vieting commented 2 years ago

I added the dim tags now. The corresponding layer is still missing, we need that as well, right?

What layer do you mean?

I mean do we somehow need to keep track of the dim tag's origin?

albertz commented 2 years ago

I added the dim tags now. The corresponding layer is still missing, we need that as well, right?

What layer do you mean?

I mean do we somehow need to keep track of the dim tag's origin?

From our discussion, I remember two use cases:

torch.arange over such SizeValue. For that, we can use for example RangeInAxisLayer or RangeFromLengthLayer + LengthLayer. In both cases, despite the dim tag, we also need the original tensor ref (to get some layer, to get the dynamic size from).
torch.randint with SizeValue as argument. For that, again LengthLayer + RandIntLayer. For the LengthLayer, we again need the tensor ref.

But basically any other PyTorch function you could potentially call would also need to do that. It should be correct to always use LengthLayer basically. I wonder if there is a simple way to have this automatically everywhere. Maybe we should derive SizeValue from Tensor instead of int. But I'm not sure if this would cause other problems.

In any case, all the RETURNN layers already exist, and also all the needed functionality on RETURNN side. This only needs extensions here in pytorch-to-returnn.

To summarize:

We also need to store the tensor ref. Or maybe directly the length tensor Length(axis=...)(tensor).
Maybe derive SizeValue from Tensor. Not sure... Or maybe try that out later in a separate branch/PR.

vieting commented 2 years ago

I started adding arange with dynamic limit. Could you please comment on that? Not sure if it's going in the right direction...

albertz commented 2 years ago

test_arange fails:

  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1232, in test_arange.<locals>.model_func
    line: return inputs + torch.reshape(arange, (1, -1))
    locals:
      inputs = <local> <Tensor name:? tensor:(Batch(1),5) returnn_data:'data' [B,F|F'feature:data'(5)] axes id>
      torch = <local> <module 'pytorch_to_returnn.torch' from '/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/__init__.py'>
      torch.reshape = <local> <function reshape at 0x7f6230fed670>
      arange = <local> <Tensor name:? tensor:(5,) returnn_data:'Range_output' [T|'Range_input_len'[]] axes id>
...
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/operator.py", line 98, in BinaryOperator.create_returnn_layer_dict
    line: inputs = _unify_tensor_axes_returnn_meta(*inputs)
    locals:
      inputs = <local> (<Tensor name:? tensor:(Batch(1),5) returnn_data:'data' [B,F|F'feature:data'(5)] axes id>, <Tensor name:? tensor:(1,5) returnn_data:'Cast_output' [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]] axes id>)
      _unify_tensor_axes_returnn_meta = <global> <function _unify_tensor_axes_returnn_meta at 0x7f6238203940>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/operator.py", line 469, in _unify_tensor_axes_returnn_meta
    line: assert dim_tag.dimension == prev_dim_tag.dimension, f"not matching Data {x.returnn_data}"
    locals:
      dim_tag = <local> Dim{'Range_input_len'[]}
      dim_tag.dimension = <local> None
      prev_dim_tag = <local> Dim{F'feature:data'(5)}
      prev_dim_tag.dimension = <local> 5
AssertionError: not matching Data Data{'Cast_output', [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]]}

This looks not so much related to Range but to the broadcasting logic with _unify_tensor_axes_returnn_meta. Not sure if we want to fix this here in this PR as well or in another separate PR. A simpler test case for this would probably look like:

inputs  # shape [B,F]
_, n_in = inputs.shape
b = torch.zeros([n_in]).reshape((1, -1))  # [1,F]
return inputs + b  # should broadcast b

I think this would fail in the same way.

vieting commented 2 years ago

inputs # shape [B,F] _, n_in = inputs.shape b = torch.zeros([n_in]).reshape((1, -1)) # [1,F] return inputs + b # should broadcast b I think this would fail in the same way.

For me locally, this passes.

albertz commented 2 years ago

inputs # shape [B,F] _, n_in = inputs.shape b = torch.zeros([n_in]).reshape((1, -1)) # [1,F] return inputs + b # should broadcast b I think this would fail in the same way.

For me locally, this passes.

Ah yes, there is also another problem:

<Tensor name:? tensor:(1,5) returnn_data:'Cast_output' [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]] axes id>

I think the dim 1 in axis 0 is maybe already handled fine. In _unify_tensor_axes_returnn_meta, the docstring also says:

It also makes sure that broadcasting in the batch-dim axis works correctly by potentially removing such a batch broadcast dim.

But T|'Range_input_len'[] looks wrong. This is marked as a dynamic dim. But it should be static (right)?

vieting commented 2 years ago

Would you modify the test case here? test_arange_dyn fails with

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'extern_data/placeholders/data/data' with dtype float and shape [?,5]

I think this is because for Range in make_output_tensor_from_returnn(), input_entries is all None because there are no tensors in it. We might need to pass limit.as_tensor() to Range.create_returnn_layer_dict(), but then I'm not sure how to infer the correct torch_shape in Range._get_output_shape_from_returnn().

albertz commented 2 years ago

Would you modify the test case here? test_arange_dyn fails with
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'extern_data/placeholders/data/data' with dtype float and shape [?,5]
I think this is because for Range in make_output_tensor_from_returnn(), input_entries is all None because there are no tensors in it. We might need to pass limit.as_tensor() to Range.create_returnn_layer_dict(), but then I'm not sure how to infer the correct torch_shape in Range._get_output_shape_from_returnn().

Yes right, this is a problem.

I'm not really sure where to do the limit.as_tensor(). The test should not be modified. Corresponding PyTorch code works like that, so we should make sure that it will also work in our converter. This is for the test case and also for Fairseq.

Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.

vieting commented 2 years ago

Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.

So a SizeValue should be returned as is instead of getting None in Naming._make_tensor? Or should we directly involve Length there?

albertz commented 2 years ago

Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.

So a SizeValue should be returned as is instead of getting None in Naming._make_tensor? Or should we directly involve Length there?

Most other code would not expect SizeValue. I think it is easier if this directly uses size.as_tensor() and then returns a TensorEntry.

Edit: Also only when isinstance(limit, SizeValue) and limit.dim_tag.dimension is None.

vieting commented 2 years ago

I think this is because for Range in make_output_tensor_from_returnn(), input_entries is all None because there are no tensors in it. We might need to pass limit.as_tensor() to Range.create_returnn_layer_dict(), but then I'm not sure how to infer the correct torch_shape in Range._get_output_shape_from_returnn().

Yes right, this is a problem.

I'm not really sure where to do the limit.as_tensor(). The test should not be modified. Corresponding PyTorch code works like that, so we should make sure that it will also work in our converter. This is for the test case and also for Fairseq.

Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.

Btw. what I meant was we could pass limit.as_tensor() to the call of Range() from functional.arange(). So this would not change anything for the PyTorch Code.

albertz commented 2 years ago

Btw. what I meant was we could pass limit.as_tensor() to the call of Range() from functional.arange(). So this would not change anything for the PyTorch Code.

Ok, this would fix arange. But what if you call some module directly and pass a SizeValue? This might happen. (Not Range but any module.)

vieting commented 2 years ago

Most other code would not expect SizeValue. I think it is easier if this directly uses size.as_tensor() and then returns a TensorEntry.

I get a circular import again here, if I want to import SizeValue in naming.py to do the instance check.

vieting commented 2 years ago

Ok, this would fix arange. But what if you call some module directly and pass a SizeValue? This might happen.

Range is not an original PyTorch module, so I don't think this can happen.

albertz commented 2 years ago

Ok, this would fix arange. But what if you call some module directly and pass a SizeValue? This might happen.

Range is not an original PyTorch module, so I don't think this can happen.

Not for Range but for any other PyTorch module.

vieting commented 2 years ago

Not for Range but for any other PyTorch module.

Alright, so let's fix it here. Do you have ideas regarding the circular import?

I get a circular import again here, if I want to import SizeValue in naming.py to do the instance check.

albertz commented 2 years ago

Most other code would not expect SizeValue. I think it is easier if this directly uses size.as_tensor() and then returns a TensorEntry.

I get a circular import again here, if I want to import SizeValue in naming.py to do the instance check.

So, what's the problem? Why don't you just fix it?

You anyway should avoid to import too many things in the top level of a module. I prefer to only import things locally inside a function when it is not really used anywhere else. And if only used for type annotations, then also no generic import (as I commented before).

vieting commented 2 years ago

I don't know how check whether it's static and how to infer the PyTorch shape in this case...

albertz commented 2 years ago

I don't know how check whether it's static and how to infer the PyTorch shape in this case...

In what case? You mean in Range? It should be simple: When it is isinstance(limit, Tensor), then it is dynamic, otherwise it is isinstance(limit, int) (and never isinstance(limit, SizeValue)).

vieting commented 2 years ago

I don't know how check whether it's static and how to infer the PyTorch shape in this case...

In what case? You mean in Range? It should be simple: When it is isinstance(limit, Tensor), then it is dynamic, otherwise it is isinstance(limit, int) (and never isinstance(limit, SizeValue)).

Yes, for Range. But no, also if it's a static dim, we end up with a tensor for limit. Like in test_arange.

Also, I don't know how to set torch_shape correctly in _get_output_shape_from_returnn.

vieting commented 2 years ago

Tensor.__init__ uses self._numpy_buffer = numpy.zeros(shape, dtype=dtype) if numpy_array is None else numpy_array. I think this is wrong if we have dynamic dims, right? Is there a RETURNN layer which can create zeros of a given shape?

albertz commented 2 years ago

Tensor.__init__ uses self._numpy_buffer = numpy.zeros(shape, dtype=dtype) if numpy_array is None else numpy_array. I think this is wrong if we have dynamic dims, right? Is there a RETURNN layer which can create zeros of a given shape?

Why is this wrong? _numpy_buffer is about the static Torch buffer, nothing dynamic.

vieting commented 2 years ago

Why is this wrong? _numpy_buffer is about the static Torch buffer, nothing dynamic.

Ah no sorry, but Constant.create_returnn_layer_dict needs to be fixed such that we don't just have a ConstantLayer with th numpy zeros array printed as value, right?

albertz commented 2 years ago

Btw, in make_output_tensor_from_returnn, in case all inputs are const, we evaluate via TF session.run and then also set numpy_array:

numpy_array = session.run(layer.output.placeholder, feed_dict=feed_dict)
numpy_array = numpy.transpose(numpy_array, [returnn_axis_from_torch_axis[i] for i in range(numpy_array.ndim)])

In case of scalars, we don't get a numpy.ndarray here but a numpy.float or numpy.int instead. We need to fix this because later in Tensor, we assume it is always a numpy.ndarray.

We can either fix this right here. Sth like:

if not isinstance(numpy_array, numpy.ndarray):
  numpy_array = numpy.array(numpy_array)

Or alternatively the same logic in Tensor.__init__.

albertz commented 2 years ago

Ah no sorry, but Constant.create_returnn_layer_dict needs to be fixed such that we don't just have a ConstantLayer with th numpy zeros array printed as value, right?

How exactly do we get Constant here?

Edit: Ah, via zeros, which creates a new Tensor, which is not an output from another module call, thus we get into this code in Naming.prepare_tensor_as_input():

    elif not x.output_from_calls or x.is_const:
      # Assume this is a constant.
      x.is_const = True
      const_name = x.get_canonical_name(fallback="unnamed_const")
      tensor = x.tensor()
      if not x.returnn_data:
        x.returnn_data = Data(
          name=f"const:{const_name}", shape=tensor.shape, dtype=tensor.dtype.name,
          batch_dim_axis=None, time_dim_axis=None)
        x.returnn_axis_from_torch_axis = {i: i for i in range(len(tensor.shape))}
      parent_mod = x.get_canonical_parent_module()
      prefix = (parent_mod.get_canonical_name() + "_") if parent_mod else ""
      from pytorch_to_returnn.torch.nn.modules import Constant
      mod = Constant(value=tensor)
      self.modules[mod].canonical_name = prefix + const_name
      res = mod()
      res_tensor = self.tensors[res]
      assert isinstance(res_tensor, _tensor.TensorEntry)
      assert res_tensor.returnn_data.placeholder is not None
      x.returnn_data.placeholder = res_tensor.returnn_data.placeholder

Also, I don't exactly understand the problem. The test currently fails with:

FAIL: test_layers.test_full
----------------------------------------------------------------------
Traceback (most recent call last):
...
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/functional.py", line 41, in full
    line: return zeros(*size, dtype=dtype) + fill_value
    locals:
      zeros = <global> <function zeros at 0x7f9ab0f27af0>
      size = <local> (Batch(3),)
      dtype = <local> None
      fill_value = <local> 42
...
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/naming/call.py", line 121, in CallEntry.apply_call
    line: layer_dict = module.create_returnn_layer_dict(*inputs_args, **inputs_kwargs)
    locals:
      layer_dict = <not found>
      module = <local> <Constant>
      module.create_returnn_layer_dict = <local> <function Constant.create_returnn_layer_dict at 0x7f9a993c2550>
      inputs_args = <local> ()
      inputs_kwargs = <local> {}
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/module.py", line 104, in wrapped_func
    line: return obj(*args, **kwargs)
    locals:
      obj = <local> <bound method Constant.create_returnn_layer_dict of <Constant>>
      args = <local> ()
      kwargs = <local> {}
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/variable.py", line 72, in Constant.create_returnn_layer_dict
    line: assert isinstance(value, numpy.ndarray)
    locals:
      isinstance = <builtin> <built-in function isinstance>
      value = <local> 0.0
      numpy = <global> <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
      numpy.ndarray = <global> <class 'numpy.ndarray'>
AssertionError:

And we have the code Constant.create_returnn_layer_dict:

  def create_returnn_layer_dict(self, *inputs):  # ignore inputs
    tensor = self.value
    value = tensor.numpy()
    ...

Why is value not a Numpy array here? Is this related to my previous comment on make_output_tensor_from_returnn? It very much looks like the same symptom. But here this self.value tensor is created via:

def zeros(*size, out=None, dtype=None, layout=None, device=None, requires_grad=False):
  return Tensor(*size, dtype=dtype)

and zeros(*size, dtype=dtype).

vieting commented 2 years ago

Why is value not a Numpy array here?

tensor.numpy() returns a numpy array (array([0., 0., 0.], dtype=float32)), but batch_axis=0. And after the if batch_axis is not None branch, value=0.

vieting commented 2 years ago

Should zeros maybe directly call Constant? And then we could have a more generic layer definition there, where not just the numpy array is printed in the net dict but it could make use of the SizeValues passed for the shape.

Or maybe a different module because that could not use the ConstantLayer I guess.

albertz commented 2 years ago

Why is value not a Numpy array here?

tensor.numpy() returns a numpy array (array([0., 0., 0.], dtype=float32)), but batch_axis=0. And after the if batch_axis is not None branch, value=0.

Ah, we probably need this there:

value = numpy.array(value[0])  # remove batch axis

Should zeros maybe directly call Constant? And then we could have a more generic layer definition there, where not just the numpy array is printed in the net dict but it could make use of the SizeValues passed for the shape.

Or maybe a different module because that could not use the ConstantLayer I guess.

Our current Constant module does not really work this way. It expects an existing Tensor and this is assumed to be constant and then wrapped via the module. Also, the current logic in prepare_tensor_as_input and Constant preserves the original tensor. I think some other code might depend on this behavior. I think I would try to avoid changing this now because this might cause again some other unrelated problems.

The ConstantLayer on RETURNN side now has become more generic. It accepts a generic shape option where you could pass dynamic dim tags. But this requires that the value is the same in the whole tensor. But this should always be the case when this is used currently.

So we might change the current implementation of our Constant module to also handle more custom tensor inputs. But I think this is not needed at the moment for the cases we have here.

vieting commented 2 years ago

So we might change the current implementation of our Constant module to also handle more custom tensor inputs. But I think this is not needed at the moment for the cases we have here.

This is because it's the batch dim here, right? For other dynamic dims, that would not work, would it? But alright, if it's fine here, then this could be done later if needed.

albertz commented 2 years ago

So we might change the current implementation of our Constant module to also handle more custom tensor inputs. But I think this is not needed at the moment for the cases we have here.

This is because it's the batch dim here, right? For other dynamic dims, that would not work, would it? But alright, if it's fine here, then this could be done later if needed.

Right. Yes, for other dynamic dims, this does not work yet. But this would be simple to add via the shape option. We can do that later once we need it. Or now if you want. It should be sth like:

  def create_returnn_layer_dict(self, *inputs):  # ignore inputs
    tensor = self.value
    value = tensor.numpy()
    have_dynamic = False
    if any(d.dim_tag.dimension is None for d in tensor.shape):  # any dynamic
      have_dynamic = True
      # Make sure value is a scalar. ConstantLayer doesn't work otherwise.
      value = value.flatten()
      for i in range(1, value.shape[0]):
        numpy.testing.assert_equal(value[0], value[i])
      value = numpy.array(value[0])
    assert isinstance(value, numpy.ndarray)
    # Simplify representation in these simple cases.
    if not value.shape:  # scalar
      if value.dtype == "int32":
        value = int(value)
      elif value.dtype == "float32":
        value = float(value)
    if have_dynamic:
      naming = Naming.get_instance()
      call = naming.module_call_stack[-1]
      assert call.module.module is self
      # Add the dynamic dim tag sources as dependencies.
      call.inputs_flat = [naming.tensor[d.originated_tensor] for d in tensor.shape if d.dim_tag.dimension is None]
    return {
      "class": "constant", "value": value,
      "shape": [d.dim_tag if d.dim_tag.dimension is None else int(d) for d in tensor.shape]}

albertz commented 2 years ago

The recent changes unveil further bugs now:

  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/utils/rnn.py", line 12, in pack_padded_sequence
    line: for frame in range(lengths[0]):
    locals:
      frame = <not found>
      range = <builtin> <class 'range'>
      lengths = <local> <Tensor name:? tensor:(Batch(3),) returnn_data:'add_output' [B] axes id>
TypeError: 'Tensor' object cannot be interpreted as an integer

Such for loop cannot work on dynamic tensors. pack_padded_sequence has to be reimplemented.

I'm not exactly sure about the easiest way. Currently we cannot really do loops.

~~If you wrap RangeFromLengthLayer directly, as RangeFromLength module, or range_from_length function, then:~~ Or simpler, use torch.arange:

def _batch_sizes_from_lengths(lengths: Tensor) -> Tensor:
  indices = torch.arange(torch.max(lengths))  # [T]
  indices_bc = indices.unsqueeze(0)  # [1,T]
  l_bc = lengths.unsqueeze(1)  # [B,1]
  mask = indices_bc < l_bc  # [B,T]
  batch_sizes = torch.sum(mask.int(), dim=0)  # [T]
  return batch_sizes

(Btw, in general, we could really wrap any more complex layer or subnet structure, using RecLayer, SubnetworkLayer etc explicitly.)

albertz commented 2 years ago

Btw, in Constant.make_output_tensor_from_returnn, we do:

entry.is_const = False  # this depends on the (dynamic) batch dim

I wonder if this causes maybe some of the problems here. I wonder what happens if we just remove this.

albertz commented 2 years ago

I might checkout the branch later tonight and try some debugging and maybe follow-up fixes myself. So make sure that you get my commits (pull from the branch).

albertz commented 2 years ago

I think this looks all fine now.

rwth-i6 / pytorch-to-returnn

SizeValue generic support for dynamic sizes #87