Closed albertz closed 2 years ago
Would this replace the possibility to define out_type
(which you would do in RETURNN)? I didn't continue with the debugging yet, but I think I got some kind of related issue in my config right now. Cannot tell for sure, will report back once I had time to check.
I'm not sure yet whether this would at the same time remove the old way, or just be an additional way.
In any case, I don't think this would really change much in behavior, depending on what you did. So whatever problem you had, you might have it again now. But not sure about the details on your issue. You maybe should report a separate issue about it. But also not sure if this is even related here to returnn-common, or just RETURNN itself?
But also, in general, you never should use out_type
anyway. If you do, this sounds like you probably should do it in a different way.
The dimension tags would be used whenever you must specify an axis (for Reduce
, Dot
, etc), or a new shape (for Const
, Variable
, etc).
In general, I want to push the concept of named tensors (named array axes; i.e. dimension tags in RETURNN) much more. I want to deprecate or even not allow axis specified as integers. Axis specified as integers (index) can very easily become a problem because the index can easily change (because we optimize sth differently, then the order of axis will be different, or the user adds some other layers and doesn't notice, etc). So the order of axes should not matter at no point.
But also, in general, you never should use out_type anyway. If you do, this sounds like you probably should do it in a different way.
Without further investigation on my own config, how would this work with an Eval
layer here? Is returnn able to (or should be able to) implicitly solve the eval and handle the dimension?
In general, I want to push the concept of named tensors (named array axes; i.e. dimension tags in RETURNN) much more.
I like this idea, this would make things a lot more straight forward too, I think.
But also, in general, you never should use out_type anyway. If you do, this sounds like you probably should do it in a different way.
Without further investigation on my own config, how would this work with an
Eval
layer here? Is returnn able to (or should be able to) implicitly solve the eval and handle the dimension?
The EvalLayer
should never change the shape. EvalLayer
was never intended for this (although I know that some people abuse it for that). All reshaping logic should be handled by other layers. If you think sth is missing in RETURNN, open a separate issue in RETURNN about this.
But for returnn-common, we don't even need Eval
at all because any possible use case is much more elegant using directly the returnn-common features. The possible use cases was actually just convenience, to write sth like a * b + c * d
in a simpler way. Although that was already possible before in RETURNN using CombineLayer
with kind mul
and add
. But just a bit annoying to write a lot of separate layers for that. But now in returnn-common, you can simply directly write sth like a * b + c * d
. So no need for Eval
.
If sth is not clear about this, please open a separate issue, as this is all off-topic now from this issue here.
We should clarify this because this is very fundamental to how all our code will look like.
The main question is to decide on these two options:
axis
, axes
, new_dim
is involved, but also for out_dim
in Linear
, LSTM
, Conv
and some others. Edit we decided to do this.I think when we require them, we should also require them to be unique.
I tend to prefer having them mandatory. This might make some parts maybe a bit more verbose from the first glance. But this should resolve any possible ambiguity on axes, and make the code very explicit and clear.
We might however introduce some shorter aliases, or maybe restructure things a bit. For example:
Dim = DimensionTag
Edit: Done. Actually the other way around, it was renamed to Dim
, and we have an alias for DimensionTag
for older code (mostly RETURNN, not relevant for returnn_common).B
or Batch
or batch_dim
for Dim(kind=DimensionTag.Types.Batch)
. Edit We have a global batch_dim
now.F(dim, name=None, **kwargs)
for Dim(kind=DimensionTag.Types.Feature, dimension=dim, description=name, **kwargs)
. Edit We have FeatureDim(...)
now.S(name=None, **kwargs)
for Dim(kind=DimensionTag.Types.Spatial, description=name, **kwargs)
. Edit We have SpatialDim(...)
now.So in your config, it could look like:
input_dim = nn.FeatureDim("mfcc", 40)
classes_dim = nn.FeatureDim("cart", 5000)
time_dim = nn.SpatialDim("time")
extern_data = {
"data": {"dim_tags": [nn.batch_dim, time_dim, input_dim]},
"classes": {"dim_tags": [nn.batch_dim, time_dim], "sparse_dim": classes_dim}
}
class Model(nn.Module):
def __init__(self, in_dim: nn.Dim, out_dim: nn.Dim):
super().__init__()
self.linear = nn.Linear(out_dim=out_dim, in_dim=in_dim)
def forward(self, input: nn.LayerRef) -> Layer:
return self.linear(input)
model = Model(in_dim=input_dim, out_dim=classes_dim)
...
(Just a draft.)
Not much feedback has been given here. So you agree?
We should maybe clarify some of the implications. For lots of code, not much changes, except the signature changes type int to DimensionTag.
When new dimensions are introduced, this looks a bit more verbose.
Would this imply that "F" or "T" are also not allowed anymore?
Mandatory is fine for me.
Would this imply that "F" or "T" are also not allowed anymore?
I do not have enough knowledge to understand the consequences of this.
Would this imply that "F" or "T" are also not allowed anymore?
I do not have enough knowledge to understand the consequences of this.
Sorry, maybe I was confusing here. This is a design decision we should take. We can either still allow "F"
and "T"
or not. Very related is also a similar question on RETURNN side: https://github.com/rwth-i6/returnn/issues/586
Or maybe we just should change the behavior (on RETURNN side or here in returnn-common): Whenever "F"
or "T"
is non-ambiguous (according to some sensible rules), then it is allowed, otherwise not. The rules could be:
Some aspects on DimensionTag
should maybe also be simplified directly on RETURNN side. Here is the corresponding issue: https://github.com/rwth-i6/returnn/issues/782
We have some simplifications on RETURNN side now, via https://github.com/rwth-i6/returnn/pull/822.
Specifically, the class was renamed to simply Dim
. And we have helpers SpatialDim(...)
and FeatureDim(...)
for the user.
We decided to make it mandatory. This is implemented now but still needs some follow-up fixes.
Also, we again need to think about changing or extending the way to get extern data into the network. Specifically, where and how the dim tags of extern data are defined. So related is #44 and maybe also #56.
I think the external data is a rather small problem here, because there it is clear they come from "outside" the network. I think the bigger difficulty is on the decision where to actually define the dimension tags that are needed within modules, and how to collect them to write them in a config.
For example the current code of the SelfAttention is problematic in that sense:
class SelfAttention(SelfAttentionBase):
"""
Classic self attention
"""
@nn.scoped
def __call__(self, source: nn.LayerRef, *, axis: nn.Dim) -> nn.Layer:
"""forward"""
expand_dim = nn.SpatialDim("self_att_expand_dim")
qkv = self.qkv(source)
qkv = nn.split_dims(
qkv, axis=self.qkv_dim_total, dims=(self.num_heads, self.qkv_dim_per_head), name="qkv_split_dims")
q, k, v = nn.split(
qkv, axis=self.qkv_dim_per_head,
out_dims=(self.key_dim_per_head, self.key_dim_per_head, self.value_dim_per_head),
name="qkv_split")
k = nn.reinterpret_data(k, set_dim_tags={axis: expand_dim}, name="k_new_dim")
v = nn.reinterpret_data(v, set_dim_tags={axis: expand_dim}, name="v_new_dim")
att = dot_attention(q, k, v, key_dim=self.key_dim_per_head, axis=expand_dim, att_dropout=self.att_dropout)
output = nn.merge_dims(
att, axes=(self.num_heads, self.value_dim_per_head), out_dim=self.value_dim_total, name="output")
return output
Here expand_dim
is a local variable that is not accessable from the outside. So the question is: is it good that it always has the same name for all instances of SelfAttention
you would create/apply? Especially when you would apply the same SelfAttention on two different inputs (axis would be a different Dim
) I am not sure if this will work at all then.
Then, the returnn_common functionality needs to be fully serializable. A random idea of me would be that you need to collect all locally created dims and add them either to the return value or a local variable (I guess the later makes more sense), and the nn.scoped
wrapper can then collect this properly in the sense that it adds maybe a unique name prefix for the dim in relation to the subnetwork name the __call__
result will be assigned to.
And then a context needs to be able to return all dim
instances that were created anywhere, so that this is properly serializable.
The serialization itself is of course another issue, especially when dim tags are passed among many modules, but I guess this is not a problem related to returnn_common
itself but more about the way in which we would use it in combination with Sisyphus. But as long as each dimension tag object instance has a unique name (if not you could use random strings, but this would look ugly has to be unique and static with respect to the module for proper hashing) and a correct __repr__
implementation this is easy to solve on the Sisyphus side.
I think the external data is a rather small problem here, because there it is clear they come from "outside" the network. I think the bigger difficulty is on the decision where to actually define the dimension tags that are needed within modules, and how to collect them to write them in a config.
I don't really see a problem there.
axis
, in_dim
, in_spatial_dim
, whatever).out_dim
, out_spatial_dim
, whatever).For example the current code of the SelfAttention is problematic in that sense:
class SelfAttention(SelfAttentionBase): """ Classic self attention """ @nn.scoped def __call__(self, source: nn.LayerRef, *, axis: nn.Dim) -> nn.Layer: """forward""" expand_dim = nn.SpatialDim("self_att_expand_dim") ...
Here
expand_dim
is a local variable that is not accessable from the outside. So the question is: is it good that it always has the same name for all instances ofSelfAttention
you would create/apply? Especially when you would apply the same SelfAttention on two different inputs (axis would be a differentDim
) I am not sure if this will work at all then.
expand_dim
here is only used inside the module. It is also never visible to the outside. Why should it matter? Why would it not work? Where exactly is the problem? Why is the name a problem?
Regarding the serialization, I think we discussed this already. This is easily doable. I think I would just handle this in Net.make_net_dict_raw
or so at the end. It would simply collect all Dim
instances, put them into some dict (dims
or so) which would have unique names (or any keys) for all the dim tags, and then we would replace the Dim
instance by some DimProxy
or so where the __repr__
would look like dims[key]
. And then next to (before) the network
, you would also put the dims
in the RETURNN config.
We cannot make use of Dim.__repr__
because we always have Dim(...) != Dim(...)
(different instances are different). We had some related discussion in https://github.com/rwth-i6/returnn/issues/860.
Beside network
and dims
, I think we might also need to put other options. E.g. some aspects here assume a recent RETURNN behavior_version
, at least 11. I think this should also be put there. I'm currently thinking about also handling extern_data
completely via returnn-common, such that you have the extern data dims directly available, so then we would also create that.
The second reply answers all the questions regarding serialization and unique names.
I see that I did not specify my first question good. I discussed this with Benedikt and I am not sure if this is even a problem. I thought about debugging, where you can have an error with a dimension, but if it is created with the same name at e.g. 50 locations you might not be able to pinpoint which dim exactly caused the error. But I guess in practice the error will always occur in conjunction with the layer where it is used, so I guess this is no problem at all. Sorry for the misleading question.
I think just experience when using this will tell us whether and where we can maybe improve sth, and what we could improve. Certain things are very simple to improve. E.g. we could somehow attach the context where a Dim
was created to the Dim
instance, if that really turns out to be a problem.
- All dims which are configurable and supposed to be used by the module for some output dimension would be passed as arguments as well (
out_dim
,out_spatial_dim
, whatever).
Also, we would need to think about how out_shape
should work.
Usually, you have some set of common dims (like the batch dim, but maybe more), that the module will not operate on.
We would also need to pass a set of these to each layer in some way. Or maybe we could also supply some explicit out_shape
to the module, and then infer it from there.
- All dims which are configurable and supposed to be used by the module for some output dimension would be passed as arguments as well (
out_dim
,out_spatial_dim
, whatever).Also, we would need to think about how
out_shape
should work. Usually, you have some set of common dims (like the batch dim, but maybe more), that the module will not operate on. We would also need to pass a set of these to each layer in some way. Or maybe we could also supply some explicitout_shape
to the module, and then infer it from there.
Ah yes, good point. Maybe for the __call__
, there could be another in_shape
argument and then the module can derive it and add such checks on intermediate outputs. Or maybe better, via #47, we anyway have the shape of any inputs already, so no need for an extra in_shape
. The user could have checked in advance that the inputs are as expected.
There could be an out_shape
argument for module __call__
. Although some modules might return multiple things, then this is a bit ambiguous. Also we cannot really make sure that all modules have this argument, so this would more be a convention than a strict rule. But this is ok.
Or, when we do #47, the user could explicitly do it afterwards like:
out = model(..., out_dim=feature_dim)
out.assert_shape({batch_dim, time_dim, feature_dim})
Maybe that is cleaner?
Or maybe better, via #47, we anyway have the shape of any inputs already, so no need for an extra
in_shape
. The user could have checked in advance that the inputs are as expected
That would be even better, yes. Then every module checking shapes (which ideally, every module should do I think), would have something like this on top:
common_dims = in.get_shape() - {in_dim}
out = ...
out.assert_shape(common_dims | {out_dim})
Or, when we do #47, the user could explicitly do it afterwards like:
out = model(..., out_dim=feature_dim) out.assert_shape({batch_dim, time_dim, feature_dim})
Maybe that is cleaner?
This wouldn't work for intermediate states right? (Also, when nesting modules, the outer module would also need to know the common dims to assert the output of the inner module). I think more complicated modules should check their output type themselves, too. Just like you would specify the return type of a function, this is very important documentation.
The out_shape
or assert_shape
is anyway mostly for the user. A user might not care what happens inside the model
, only that he gets some specific output in the end.
However, if you are the author of model
, or you want that any user can easily read the code of your model, then sure, you would also have out_shape
or assert_shape
usages inside the model, for any intermediate outputs, and maybe also the final output.
We should be careful to not make it too verbose or too complicated. When the code looks complex from a first glance, this is also bad. out_shape
might be a bit better w.r.t. this aspect because it avoids a separate call assert_shape
. But we anyway could support both.
Another aspect: Probably many modules will be lazy w.r.t. the input dim, just like it is standard in RETURNN and other frameworks. E.g. the user would specify out_dim
but not in_dim
. So in_dim
is only known on the first call to the module, and then automatically inferred from the input (in.in_dim
or so, which refers to the feature dim). I think enforcing to specify in_dim
at module creation would be too verbose in many cases, so that is why we would have to support such lazy in_dim
, and also using the feature dim logic.
So, I currently have this Linear
module implementation:
class Linear(nn.Module):
"""
Linear transformation.
"""
def __init__(self, out_dim: nn.Dim, *, in_dim: Optional[nn.Dim] = None):
super().__init__()
self.out_dim = out_dim
self.out_dim_inner = out_dim
self.in_dim = in_dim
self.weight = None # type: Optional[nn.Parameter]
self.bias = None # type: Optional[nn.Parameter]
if in_dim:
self._lazy_init(in_dim)
def _lazy_init(self, in_dim: nn.Dim):
if self.in_dim:
assert self.in_dim == in_dim
else:
self.in_dim = in_dim
if in_dim == self.out_dim:
self.out_dim_inner = self.out_dim.copy(same_as_self=False, description=f"{self}:out-dim-inner")
self.weight = nn.Parameter((self.in_dim, self.out_dim_inner))
self.bias = nn.Parameter((self.out_dim_inner,))
@nn.scoped
def __call__(self, source: nn.LayerRef) -> nn.Layer:
self._lazy_init(source.dim)
out = nn.dot(source, self.weight, reduce=self.in_dim) + self.bias
if self.out_dim_inner != self.out_dim:
out = nn.reinterpret_data(out, set_dim_tags={self.out_dim_inner: self.out_dim})
return out
You see that I needed to introduce this out_dim_inner
here. The problem occurs when we have in_dim == out_dim
. The dim tags in the parameter are not distinguishable then, and the dot
reduce=in_dim
is ambiguous.
I'm not really sure about the best way to solve this. I'm not really happy with this solution now.
@Zettelkasten any suggestions or ideas?
While further thinking about this, some thoughts and options:
dim_tags = (in_dim, out_dim)
with out_dim = in_dim
, we need some way to distinguish those dims. But this cannot be just the order.get_axis_from_description
should fail due to the ambiguity.in_dim == out_dim
, it does out_dim_ = out_dim.copy(same_as_self=False, ...)
and then using shape (in_dim, out_dim_)
.
(in_dim, out_dim)
. Or all other code must also have this special logic for this special case.LSTM
or others, when there is some matrix multiplication involved. It should be simple to always make use of the same logic for this case.reduce=in_dim
is searching for matching dims, we need some way that it uniquely finds the right axis.matching_priority
flag or so, set on dim tags. In get_axis_from_description
, when a dim tag matches multiple existing dim tags, but one of the matching tags has a higher matching priority, then it would still be allowed, and it would use that one. So in the example, in_dim == out_dim
(but in_dim is not out_dim
) but in_dim
would be a copy with higher matching priority. The code would maybe look like:
weight = nn.Parameter((in_dim.copy(matching_priority=1), out_dim))
Edit I implemented this now here: https://github.com/rwth-i6/returnn/pull/871
Another thing: I wonder whether passing out_spatial_dim
as argument is maybe un-intuitive. In many cases, it is a new dimension. So (also as the name implies) it is an output of the function or module. The current code often looks like this:
out_spatial_dim = nn.SpatialDim("whatever")
out = self.conv(..., out_spatial_dim=out_spatial_dim)
This is for dynamic sizes. For static dims, it is even more complicated, as you need to calculate the dim by hand (although maybe you could argue, it is a good thing to have it explicit).
Maybe the module should instead just create such a dim itself, and then return it?
One question is, how should it be returned? Just Tuple[nn.Layer, nn.Dim]
? Or namedtuple
? Or sth else?
For example, it would then look like:
out, out_spatial_dim = self.conv(...)
I think this is nicer.
The case where it is known beforehand (or expected to match some other existing dim) is maybe valid, but the user could easily still do that by calling declare_same_as
afterwards.
out_dim
is different though. out_dim
would stay an argument. This is because out_dim
is usually a feature dim and usually can be arbitrarily defined by the user, like Linear
or Conv
. Although, maybe for the cases where it is not arbitrary, like merge_dims
for example, maybe there it should be returned as well?
Or consider the ConformerConvSubsample.__call__
:
@nn.scoped
def __call__(self, inp: nn.LayerRef, *, in_spatial_dim: nn.Dim, out_spatial_dim: nn.Dim) -> nn.LayerRef:
"""forward"""
in_spatial_dims = [in_spatial_dim, inp.feature_dim]
in_dim = nn.FeatureDim("dummy-input-feature-dim", 1)
x = nn.expand_dim(inp, dim=in_dim)
for i, conv_layer in enumerate(self.conv_layers):
out_spatial_dims = [nn.SpatialDim(f"conv-{i}-1"), nn.SpatialDim(f"conv-{i}-2")]
x = conv_layer(x, in_dim=in_dim, in_spatial_dims=in_spatial_dims, out_spatial_dims=out_spatial_dims)
in_spatial_dims = out_spatial_dims
in_dim = conv_layer.out_dim
x = self.activation(x)
if self.pool_sizes and i < len(self.pool_sizes):
x = nn.pool(
x, in_dim=in_dim, in_spatial_dims=in_spatial_dims,
pool_size=self.pool_sizes[i], padding='same', mode='max')
if self.dropout:
x = nn.dropout(x, axis=in_dim, dropout=self.dropout)
out = nn.merge_dims(x, axes=in_spatial_dims, out_dim=out_spatial_dim)
return out
With the suggestion, it would look like:
@nn.scoped
def __call__(self, inp: nn.LayerRef, *, in_spatial_dim: nn.Dim) -> nn.LayerRef:
"""forward"""
in_spatial_dims = [in_spatial_dim, inp.feature_dim]
in_dim = nn.FeatureDim("dummy-input-feature-dim", 1)
x = nn.expand_dim(inp, dim=in_dim)
for i, conv_layer in enumerate(self.conv_layers):
x, in_spatial_dims = conv_layer(x, in_dim=in_dim, in_spatial_dims=in_spatial_dims)
in_dim = conv_layer.out_dim
x = self.activation(x)
if self.pool_sizes and i < len(self.pool_sizes):
x = nn.pool(
x, in_dim=in_dim, in_spatial_dims=in_spatial_dims,
pool_size=self.pool_sizes[i], padding='same', mode='max')
if self.dropout:
x = nn.dropout(x, axis=in_dim, dropout=self.dropout)
out, out_spatial_dim = nn.merge_dims(x, axes=in_spatial_dims)
return out, out_spatial_dim
@Zettelkasten opinions?
Edit This is implemented like this now.
The last suggestion was implemented now, i.e. namely that functions and modules return a potentially new dim tag, and the argument (e.g. out_spatial_dim
) is optional, and when None, it auto-creates some new dim.
Remaining is the issue with out_dim_inner
. There is still the suggested solution here: https://github.com/rwth-i6/returnn/pull/871
This also has some further discussion.
In any case, out_dim_inner
should be removed. The param should be enough.
Another external data point is Mesh TensorFlow, and the gpt-neo implementation. Examples:
dim_kv = mtf.Dimension("features_per_head", params["n_embd"] // params["n_head"])
mtfparams = mtf.transformer.attention.attention_params_simple(
x.mesh,
io_dim=dim_embd,
kv_dim=dim_kv,
heads_dim=dim_heads,
variable_dtype=variable_dtype
)
k = mtf.replace_dimensions(k, k.shape[1], memory_length_dim)
v = mtf.replace_dimensions(v, v.shape[1], memory_length_dim)
a = mtf_transformer.attention.attention(
q, k, v,
memory_length_dim=memory_length_dim,
key_dim=dim_kv,
value_dim=dim_kv,
bias=broadcasted_bias,
dropout_rate=attn_dropout_rate
)
def causal_linear_attention(q, k, v, eps = 1e-6):
batch_dim, seq_dim, head_dim, dim_out = (v.shape[0], v.shape[1], v.shape[2], v.shape[3])
q = mtf.rename_dimension(q, "features_per_head", "features_per_head_in")
k = mtf.rename_dimension(k, "features_per_head", "features_per_head_in")
dim_in = k.shape[-1]
q = mtf.softmax(q, dim_in)
k = mtf.exp(k)
cumulative_k = mtf.cumsum(k, seq_dim) + eps
D_inv = 1. / mtf.einsum([q, cumulative_k], output_shape=[batch_dim, seq_dim, head_dim])
context = mtf.einsum([k, v], output_shape=[batch_dim, seq_dim, head_dim, dim_in, dim_out])
cumulative_context = mtf.cumsum(context, seq_dim)
attn = mtf.einsum([q, cumulative_context, D_inv], output_shape=[batch_dim, seq_dim, head_dim, dim_out])
return attn
def einsum(xs, output_shape=None, reduced_dims=None, name=None):
...
Note that a mtf.Dimension
is very simple. It's just (def):
Dimension = collections.namedtuple("Dimension", ["name", "size"])
where name: str
and size: int
.
MTF also requires unique dimensions in a shape (see here).
(Edit I moved this to another separate issue: #97)
Some further thoughts:
out_shape
or assert_shape
is about checking the complete shape including all dims. The user or developer might use this as a kind of verification.
The API of any module or function might (should) however be flexible to any number of dimensions. Basically, for some input tensor, there are maybe one or more dims which are relevant (in_dim
, in_spatial_dims
, axis
, or whatever) and there could also be any other number of dims which will then be treated just as the batch dim.
This is maybe a separate aspect. The definition of module/function API is maybe not covered well yet. The usual syntax is via type annotations. There this new Python PEP is relevant, although we would want to differ in a few aspects:
One question is whether we should somehow be more explicit when there are multiple inputs, whether the output would include all dims from all inputs, or sth else. E.g. maybe it could look like:
func(a: Tensor[{A, B, *more_a}], b: Tensor[{A, B, *more_b}])
-> Tensor[{A, B, *more_a.union(more_b)}]
Or simpler:
func(a: Tensor[{A, B, Any}], b: Tensor[{A, B, Any}])
-> Tensor[{A, B, Any}]
With the assumption that the output would always be the union of the inputs.
Or more specific:
func(a: Tensor[{A, B, Any}], b: Tensor[{A, B, Any}])
-> Tensor[{A, B, Any(a, b)}]
Or even more simpler:
func(a: Tensor[{A, B}], b: Tensor[{A, B}])
-> Tensor[{A, B}]
With the assumption that there always can be additional dims.
Dim.match_priority
support (https://github.com/rwth-i6/returnn/pull/871) is merged now, due to the lack of any better solution for the ambiguous dim tags.
Now we need to adopt the code here accordingly (get rid of out_dim_inner
etc).
This is all implemented now.
Like batch dim, spatial dims (with dynamic lengths), or static dims (named, or also unnamed).
Dim
(earlierDimensionTag
) in RETURNN. Directly use that, or wrap it somehow?Should this (the batch dim) include beam information, or be separate from this?
Relevant for all layers which define some shape or dimension (e.g.
Const
,Variable
).Should this be enforced, i.e. no simple
int
allowed inn_out
or so but always aDim
object? And maybe better useout_dim
instead ofn_out
(consistent with https://github.com/rwth-i6/returnn/issues/597). Edit: It was decided to makenn.Dim
mandatory, and useout_dim
instead ofn_out
.Very related is this issue on RETURNN side on explicit dim tags: https://github.com/rwth-i6/returnn/issues/597
Related is also whether we want unique dim tags? (#48, https://github.com/rwth-i6/returnn/issues/632)
This issues covers multiple aspects:
Dim
. We inherit all its logic on equality etc. We also haveFeatureDim
,SpatialDim
from RETURNN.Dim
instances) are mandatory for any shape or sizeTensor.verify_out_shape
for easy verificationin_dim == out_dim
, square matrices, https://github.com/rwth-i6/returnn_common/issues/17#issuecomment-997312167, https://github.com/rwth-i6/returnn/pull/871out_spatial_dim
argument, the a new spatial dim gets returned. Seepool1d
for example.