Closed vieting closed 2 years ago
I added the dim tags now. The corresponding layer is still missing, we need that as well, right?
I added the dim tags now. The corresponding layer is still missing, we need that as well, right?
What layer do you mean?
Or you mean the changes to reshape
or more specifically the Unflatten
module?
I added the dim tags now. The corresponding layer is still missing, we need that as well, right?
What layer do you mean?
I mean do we somehow need to keep track of the dim tag's origin?
I added the dim tags now. The corresponding layer is still missing, we need that as well, right?
What layer do you mean?
I mean do we somehow need to keep track of the dim tag's origin?
From our discussion, I remember two use cases:
torch.arange
over such SizeValue
. For that, we can use for example RangeInAxisLayer
or RangeFromLengthLayer
+ LengthLayer
. In both cases, despite the dim tag, we also need the original tensor ref (to get some layer, to get the dynamic size from).torch.randint
with SizeValue
as argument. For that, again LengthLayer
+ RandIntLayer
. For the LengthLayer
, we again need the tensor ref.But basically any other PyTorch function you could potentially call would also need to do that. It should be correct to always use LengthLayer
basically. I wonder if there is a simple way to have this automatically everywhere. Maybe we should derive SizeValue
from Tensor
instead of int
. But I'm not sure if this would cause other problems.
In any case, all the RETURNN layers already exist, and also all the needed functionality on RETURNN side. This only needs extensions here in pytorch-to-returnn.
To summarize:
Length(axis=...)(tensor)
.SizeValue
from Tensor
. Not sure... Or maybe try that out later in a separate branch/PR.I started adding arange
with dynamic limit. Could you please comment on that? Not sure if it's going in the right direction...
test_arange
fails:
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1232, in test_arange.<locals>.model_func
line: return inputs + torch.reshape(arange, (1, -1))
locals:
inputs = <local> <Tensor name:? tensor:(Batch(1),5) returnn_data:'data' [B,F|F'feature:data'(5)] axes id>
torch = <local> <module 'pytorch_to_returnn.torch' from '/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/__init__.py'>
torch.reshape = <local> <function reshape at 0x7f6230fed670>
arange = <local> <Tensor name:? tensor:(5,) returnn_data:'Range_output' [T|'Range_input_len'[]] axes id>
...
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/operator.py", line 98, in BinaryOperator.create_returnn_layer_dict
line: inputs = _unify_tensor_axes_returnn_meta(*inputs)
locals:
inputs = <local> (<Tensor name:? tensor:(Batch(1),5) returnn_data:'data' [B,F|F'feature:data'(5)] axes id>, <Tensor name:? tensor:(1,5) returnn_data:'Cast_output' [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]] axes id>)
_unify_tensor_axes_returnn_meta = <global> <function _unify_tensor_axes_returnn_meta at 0x7f6238203940>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/operator.py", line 469, in _unify_tensor_axes_returnn_meta
line: assert dim_tag.dimension == prev_dim_tag.dimension, f"not matching Data {x.returnn_data}"
locals:
dim_tag = <local> Dim{'Range_input_len'[]}
dim_tag.dimension = <local> None
prev_dim_tag = <local> Dim{F'feature:data'(5)}
prev_dim_tag.dimension = <local> 5
AssertionError: not matching Data Data{'Cast_output', [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]]}
This looks not so much related to Range
but to the broadcasting logic with _unify_tensor_axes_returnn_meta
. Not sure if we want to fix this here in this PR as well or in another separate PR. A simpler test case for this would probably look like:
inputs # shape [B,F]
_, n_in = inputs.shape
b = torch.zeros([n_in]).reshape((1, -1)) # [1,F]
return inputs + b # should broadcast b
I think this would fail in the same way.
inputs # shape [B,F] _, n_in = inputs.shape b = torch.zeros([n_in]).reshape((1, -1)) # [1,F] return inputs + b # should broadcast b I think this would fail in the same way.
For me locally, this passes.
inputs # shape [B,F] _, n_in = inputs.shape b = torch.zeros([n_in]).reshape((1, -1)) # [1,F] return inputs + b # should broadcast b I think this would fail in the same way.
For me locally, this passes.
Ah yes, there is also another problem:
<Tensor name:? tensor:(1,5) returnn_data:'Cast_output' [F|'Unflatten_split_dims0'(1),T|'Range_input_len'[]] axes id>
I think the dim 1 in axis 0 is maybe already handled fine. In _unify_tensor_axes_returnn_meta
, the docstring also says:
It also makes sure that broadcasting in the batch-dim axis works correctly by potentially removing such a batch broadcast dim.
But T|'Range_input_len'[]
looks wrong. This is marked as a dynamic dim. But it should be static (right)?
Would you modify the test case here? test_arange_dyn
fails with
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'extern_data/placeholders/data/data' with dtype float and shape [?,5]
I think this is because for Range
in make_output_tensor_from_returnn()
, input_entries
is all None
because there are no tensors in it. We might need to pass limit.as_tensor()
to Range.create_returnn_layer_dict()
, but then I'm not sure how to infer the correct torch_shape in Range._get_output_shape_from_returnn()
.
Would you modify the test case here?
test_arange_dyn
fails withtensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'extern_data/placeholders/data/data' with dtype float and shape [?,5]
I think this is because for
Range
inmake_output_tensor_from_returnn()
,input_entries
is allNone
because there are no tensors in it. We might need to passlimit.as_tensor()
toRange.create_returnn_layer_dict()
, but then I'm not sure how to infer the correct torch_shape inRange._get_output_shape_from_returnn()
.
Yes right, this is a problem.
I'm not really sure where to do the limit.as_tensor()
. The test should not be modified. Corresponding PyTorch code works like that, so we should make sure that it will also work in our converter. This is for the test case and also for Fairseq.
Maybe we can extend the Naming
logic for SizeValue
. E.g. Naming._make_tensor
.
Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.
So a SizeValue
should be returned as is instead of getting None
in Naming._make_tensor
? Or should we directly involve Length
there?
Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.
So a
SizeValue
should be returned as is instead of gettingNone
inNaming._make_tensor
? Or should we directly involveLength
there?
Most other code would not expect SizeValue
. I think it is easier if this directly uses size.as_tensor()
and then returns a TensorEntry
.
Edit: Also only when isinstance(limit, SizeValue) and limit.dim_tag.dimension is None
.
I think this is because for Range in make_output_tensor_from_returnn(), input_entries is all None because there are no tensors in it. We might need to pass limit.as_tensor() to Range.create_returnn_layer_dict(), but then I'm not sure how to infer the correct torch_shape in Range._get_output_shape_from_returnn().
Yes right, this is a problem.
I'm not really sure where to do the limit.as_tensor(). The test should not be modified. Corresponding PyTorch code works like that, so we should make sure that it will also work in our converter. This is for the test case and also for Fairseq.
Maybe we can extend the Naming logic for SizeValue. E.g. Naming._make_tensor.
Btw. what I meant was we could pass limit.as_tensor()
to the call of Range()
from functional.arange()
. So this would not change anything for the PyTorch Code.
Btw. what I meant was we could pass
limit.as_tensor()
to the call ofRange()
fromfunctional.arange()
. So this would not change anything for the PyTorch Code.
Ok, this would fix arange
. But what if you call some module directly and pass a SizeValue
? This might happen. (Not Range
but any module.)
Most other code would not expect SizeValue. I think it is easier if this directly uses size.as_tensor() and then returns a TensorEntry.
I get a circular import again here, if I want to import SizeValue
in naming.py
to do the instance check.
Ok, this would fix arange. But what if you call some module directly and pass a SizeValue? This might happen.
Range
is not an original PyTorch module, so I don't think this can happen.
Ok, this would fix arange. But what if you call some module directly and pass a SizeValue? This might happen.
Range
is not an original PyTorch module, so I don't think this can happen.
Not for Range
but for any other PyTorch module.
Not for Range but for any other PyTorch module.
Alright, so let's fix it here. Do you have ideas regarding the circular import?
I get a circular import again here, if I want to import SizeValue in naming.py to do the instance check.
Most other code would not expect SizeValue. I think it is easier if this directly uses size.as_tensor() and then returns a TensorEntry.
I get a circular import again here, if I want to import
SizeValue
innaming.py
to do the instance check.
So, what's the problem? Why don't you just fix it?
You anyway should avoid to import too many things in the top level of a module. I prefer to only import things locally inside a function when it is not really used anywhere else. And if only used for type annotations, then also no generic import (as I commented before).
I don't know how check whether it's static and how to infer the PyTorch shape in this case...
I don't know how check whether it's static and how to infer the PyTorch shape in this case...
In what case? You mean in Range
? It should be simple: When it is isinstance(limit, Tensor)
, then it is dynamic, otherwise it is isinstance(limit, int)
(and never isinstance(limit, SizeValue)
).
I don't know how check whether it's static and how to infer the PyTorch shape in this case...
In what case? You mean in Range? It should be simple: When it is isinstance(limit, Tensor), then it is dynamic, otherwise it is isinstance(limit, int) (and never isinstance(limit, SizeValue)).
Yes, for Range
. But no, also if it's a static dim, we end up with a tensor for limit
. Like in test_arange
.
Also, I don't know how to set torch_shape
correctly in _get_output_shape_from_returnn
.
Tensor.__init__
uses self._numpy_buffer = numpy.zeros(shape, dtype=dtype) if numpy_array is None else numpy_array
. I think this is wrong if we have dynamic dims, right? Is there a RETURNN layer which can create zeros of a given shape?
Tensor.__init__
usesself._numpy_buffer = numpy.zeros(shape, dtype=dtype) if numpy_array is None else numpy_array
. I think this is wrong if we have dynamic dims, right? Is there a RETURNN layer which can create zeros of a given shape?
Why is this wrong? _numpy_buffer
is about the static Torch buffer, nothing dynamic.
Why is this wrong? _numpy_buffer is about the static Torch buffer, nothing dynamic.
Ah no sorry, but Constant.create_returnn_layer_dict
needs to be fixed such that we don't just have a ConstantLayer
with th numpy zeros array printed as value
, right?
Btw, in make_output_tensor_from_returnn
, in case all inputs are const, we evaluate via TF session.run
and then also set numpy_array
:
numpy_array = session.run(layer.output.placeholder, feed_dict=feed_dict)
numpy_array = numpy.transpose(numpy_array, [returnn_axis_from_torch_axis[i] for i in range(numpy_array.ndim)])
In case of scalars, we don't get a numpy.ndarray
here but a numpy.float
or numpy.int
instead. We need to fix this because later in Tensor
, we assume it is always a numpy.ndarray
.
We can either fix this right here. Sth like:
if not isinstance(numpy_array, numpy.ndarray):
numpy_array = numpy.array(numpy_array)
Or alternatively the same logic in Tensor.__init__
.
Ah no sorry, but
Constant.create_returnn_layer_dict
needs to be fixed such that we don't just have aConstantLayer
with th numpy zeros array printed asvalue
, right?
How exactly do we get Constant
here?
Edit: Ah, via zeros
, which creates a new Tensor
, which is not an output from another module call, thus we get into this code in Naming.prepare_tensor_as_input()
:
elif not x.output_from_calls or x.is_const:
# Assume this is a constant.
x.is_const = True
const_name = x.get_canonical_name(fallback="unnamed_const")
tensor = x.tensor()
if not x.returnn_data:
x.returnn_data = Data(
name=f"const:{const_name}", shape=tensor.shape, dtype=tensor.dtype.name,
batch_dim_axis=None, time_dim_axis=None)
x.returnn_axis_from_torch_axis = {i: i for i in range(len(tensor.shape))}
parent_mod = x.get_canonical_parent_module()
prefix = (parent_mod.get_canonical_name() + "_") if parent_mod else ""
from pytorch_to_returnn.torch.nn.modules import Constant
mod = Constant(value=tensor)
self.modules[mod].canonical_name = prefix + const_name
res = mod()
res_tensor = self.tensors[res]
assert isinstance(res_tensor, _tensor.TensorEntry)
assert res_tensor.returnn_data.placeholder is not None
x.returnn_data.placeholder = res_tensor.returnn_data.placeholder
Also, I don't exactly understand the problem. The test currently fails with:
FAIL: test_layers.test_full
----------------------------------------------------------------------
Traceback (most recent call last):
...
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/functional.py", line 41, in full
line: return zeros(*size, dtype=dtype) + fill_value
locals:
zeros = <global> <function zeros at 0x7f9ab0f27af0>
size = <local> (Batch(3),)
dtype = <local> None
fill_value = <local> 42
...
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/naming/call.py", line 121, in CallEntry.apply_call
line: layer_dict = module.create_returnn_layer_dict(*inputs_args, **inputs_kwargs)
locals:
layer_dict = <not found>
module = <local> <Constant>
module.create_returnn_layer_dict = <local> <function Constant.create_returnn_layer_dict at 0x7f9a993c2550>
inputs_args = <local> ()
inputs_kwargs = <local> {}
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/module.py", line 104, in wrapped_func
line: return obj(*args, **kwargs)
locals:
obj = <local> <bound method Constant.create_returnn_layer_dict of <Constant>>
args = <local> ()
kwargs = <local> {}
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/variable.py", line 72, in Constant.create_returnn_layer_dict
line: assert isinstance(value, numpy.ndarray)
locals:
isinstance = <builtin> <built-in function isinstance>
value = <local> 0.0
numpy = <global> <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
numpy.ndarray = <global> <class 'numpy.ndarray'>
AssertionError:
And we have the code Constant.create_returnn_layer_dict
:
def create_returnn_layer_dict(self, *inputs): # ignore inputs
tensor = self.value
value = tensor.numpy()
...
Why is value
not a Numpy array here? Is this related to my previous comment on make_output_tensor_from_returnn
? It very much looks like the same symptom. But here this self.value
tensor is created via:
def zeros(*size, out=None, dtype=None, layout=None, device=None, requires_grad=False):
return Tensor(*size, dtype=dtype)
and zeros(*size, dtype=dtype)
.
Why is value not a Numpy array here?
tensor.numpy()
returns a numpy array (array([0., 0., 0.], dtype=float32)
), but batch_axis=0
. And after the if batch_axis is not None
branch, value=0
.
Should zeros
maybe directly call Constant
? And then we could have a more generic layer definition there, where not just the numpy array is printed in the net dict but it could make use of the SizeValue
s passed for the shape.
Or maybe a different module because that could not use the ConstantLayer
I guess.
Why is value not a Numpy array here?
tensor.numpy()
returns a numpy array (array([0., 0., 0.], dtype=float32)
), butbatch_axis=0
. And after theif batch_axis is not None
branch,value=0
.
Ah, we probably need this there:
value = numpy.array(value[0]) # remove batch axis
Should
zeros
maybe directly callConstant
? And then we could have a more generic layer definition there, where not just the numpy array is printed in the net dict but it could make use of theSizeValue
s passed for the shape.Or maybe a different module because that could not use the
ConstantLayer
I guess.
Our current Constant
module does not really work this way. It expects an existing Tensor
and this is assumed to be constant and then wrapped via the module. Also, the current logic in prepare_tensor_as_input
and Constant
preserves the original tensor. I think some other code might depend on this behavior. I think I would try to avoid changing this now because this might cause again some other unrelated problems.
The ConstantLayer
on RETURNN side now has become more generic. It accepts a generic shape
option where you could pass dynamic dim tags. But this requires that the value
is the same in the whole tensor. But this should always be the case when this is used currently.
So we might change the current implementation of our Constant
module to also handle more custom tensor
inputs. But I think this is not needed at the moment for the cases we have here.
So we might change the current implementation of our Constant module to also handle more custom tensor inputs. But I think this is not needed at the moment for the cases we have here.
This is because it's the batch dim here, right? For other dynamic dims, that would not work, would it? But alright, if it's fine here, then this could be done later if needed.
So we might change the current implementation of our Constant module to also handle more custom tensor inputs. But I think this is not needed at the moment for the cases we have here.
This is because it's the batch dim here, right? For other dynamic dims, that would not work, would it? But alright, if it's fine here, then this could be done later if needed.
Right. Yes, for other dynamic dims, this does not work yet. But this would be simple to add via the shape
option. We can do that later once we need it. Or now if you want. It should be sth like:
def create_returnn_layer_dict(self, *inputs): # ignore inputs
tensor = self.value
value = tensor.numpy()
have_dynamic = False
if any(d.dim_tag.dimension is None for d in tensor.shape): # any dynamic
have_dynamic = True
# Make sure value is a scalar. ConstantLayer doesn't work otherwise.
value = value.flatten()
for i in range(1, value.shape[0]):
numpy.testing.assert_equal(value[0], value[i])
value = numpy.array(value[0])
assert isinstance(value, numpy.ndarray)
# Simplify representation in these simple cases.
if not value.shape: # scalar
if value.dtype == "int32":
value = int(value)
elif value.dtype == "float32":
value = float(value)
if have_dynamic:
naming = Naming.get_instance()
call = naming.module_call_stack[-1]
assert call.module.module is self
# Add the dynamic dim tag sources as dependencies.
call.inputs_flat = [naming.tensor[d.originated_tensor] for d in tensor.shape if d.dim_tag.dimension is None]
return {
"class": "constant", "value": value,
"shape": [d.dim_tag if d.dim_tag.dimension is None else int(d) for d in tensor.shape]}
The recent changes unveil further bugs now:
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/utils/rnn.py", line 12, in pack_padded_sequence
line: for frame in range(lengths[0]):
locals:
frame = <not found>
range = <builtin> <class 'range'>
lengths = <local> <Tensor name:? tensor:(Batch(3),) returnn_data:'add_output' [B] axes id>
TypeError: 'Tensor' object cannot be interpreted as an integer
Such for
loop cannot work on dynamic tensors. pack_padded_sequence
has to be reimplemented.
I'm not exactly sure about the easiest way. Currently we cannot really do loops.
If you wrap
Or simpler, use RangeFromLengthLayer
directly, as RangeFromLength
module, or range_from_length
function, then:torch.arange
:
def _batch_sizes_from_lengths(lengths: Tensor) -> Tensor:
indices = torch.arange(torch.max(lengths)) # [T]
indices_bc = indices.unsqueeze(0) # [1,T]
l_bc = lengths.unsqueeze(1) # [B,1]
mask = indices_bc < l_bc # [B,T]
batch_sizes = torch.sum(mask.int(), dim=0) # [T]
return batch_sizes
(Btw, in general, we could really wrap any more complex layer or subnet structure, using RecLayer
, SubnetworkLayer
etc explicitly.)
Btw, in Constant.make_output_tensor_from_returnn
, we do:
entry.is_const = False # this depends on the (dynamic) batch dim
I wonder if this causes maybe some of the problems here. I wonder what happens if we just remove this.
I might checkout the branch later tonight and try some debugging and maybe follow-up fixes myself. So make sure that you get my commits (pull from the branch).
I think this looks all fine now.
For fixing #84, as demonstrated in #85, we need to save more information in
SizeValue
.