Closed Atticus1806 closed 2 years ago
_combine
, as the _
prefix indicates, is an internal function, so you should use it. So this is not a solution at all.
So, some random options right out of my head:
combine
public. Maybe also remove the unnecessary options like with_bias
or eval
etc.combine
public.combine_multi_way_bc
or sth like that.with allow_multi_way_broadcast(): ...
or so.@Zettelkasten maybe some opinion?
I think we should make combine
public. I just noticed that we have compare
public, which is related.
Actually, I was thinking it wrongly. The current behavior in returnn-common is (like the old RETURNN behavior) that it always automatically has allow_broadcast_all_sources=True
. This is the default RETURNN behavior when out_shape
is set, which is always the case here.
So, we maybe should explicitly always set allow_broadcast_all_sources=False
by default? And then in an explicit combine
function, the user can explicitly set allow_broadcast_all_sources=True
when needed?
Actually, I was thinking it wrongly. The current behavior in returnn-common is (like the old RETURNN behavior) that it always automatically has allow_broadcast_all_sources=True. This is the default RETURNN behavior when out_shape is set, which is always the case here.
Wouldn't that mean the example above a-b
should work as it is right now? For me it crashes right now in a model I am testing, when either using the -
op or not setting allow_broadcast_all_sources
at all. Or am I missunderstanding something?
So, we maybe should explicitly always set allow_broadcast_all_sources=False by default? And then in an explicit combine function, the user can explicitly set allow_broadcast_all_sources=True when needed?
Yes I like that. I dont think broadcasting should work on default, because we want returnn_common to be explicit and this would introduce implicit behavior.
So can you construct a test case example again?
def test_op_broadcasting():
# https://github.com/rwth-i6/returnn_common/issues/136
class _Net(nn.Module):
@nn.scoped
def __call__(self, x: nn.Tensor, y: nn.Tensor):
return x + y
net = _Net()
x_dim = nn.FeatureDim('x', 12)
x = nn.get_extern_data(nn.Data('x', dim_tags=[x_dim]))
y_dim = nn.FeatureDim('y', 24)
y = nn.get_extern_data(nn.Data('y', dim_tags=[y_dim]))
out = net(x, y)
out.mark_as_default_output()
config = nn.get_returnn_config().get_complete_py_code_str(net)
from returnn_common.tests.returnn_helpers import dummy_run_net, config_net_dict_via_serialized
config, net_dict = config_net_dict_via_serialized(config)
with error:
WARNING:tensorflow:From /u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/network.py:477: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
EXCEPTION
Traceback (most recent call last):
File "/u/hilmes/experiments/tts_new_sis/recipe/returnn_common/nn/base.py", line 857, in _data_from_layer_dict.<locals>._add_layer
line: out_data = layer_class.get_out_data_from_opts(**layer_desc)
locals:
out_data = <not found>
layer_class = <local> <class 'returnn.tf.layers.basic.CombineLayer'>
layer_class.get_out_data_from_opts = <local> <bound method CombineLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.CombineLayer'>>
layer_desc = <local> {'kind': 'add', '_network': <TFNetwork 'dummy_net' train>, '_name': 'add', 'sources': [<InternalLayer 'data:x' out_type=Data{[F|F'x'(12)]}>, <InternalLayer 'data:y' out_type=Data{[F|F'
y'(24)]}>], 'name': 'add', 'network': <TFNetwork 'dummy_net' train>}, len = 6
File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/layers/basic.py", line 7466, in CombineLayer.get_out_data_from_opts
line: out_type_.update(
Data.get_common_data(
[s.output for s in sources], allow_broadcast_all_sources=allow_broadcast_all_sources,
name="%s_output" % kwargs["name"]).get_kwargs())
locals:
out_type_ = <local> {}
out_type_.update = <local> <built-in method update of dict object at 0x7f234b996680>
Data = <global> <class 'returnn.tf.util.data.Data'>
Data.get_common_data = <global> <bound method Data.get_common_data of <class 'returnn.tf.util.data.Data'>>
s = <not found>
s.output = <not found>
sources = <local> [<InternalLayer 'data:x' out_type=Data{[F|F'x'(12)]}>, <InternalLayer 'data:y' out_type=Data{[F|F'y'(24)]}>]
allow_broadcast_all_sources = <local> <class 'returnn.util.basic.NotSpecified'>
name = <not found>
kwargs = <local> {'kind': 'add', '_network': <TFNetwork 'dummy_net' train>, '_name': 'add', 'name': 'add'}
get_kwargs = <not found>
File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/util/data.py", line 5358, in Data.get_common_data
line: validate_broadcast_all_sources(
allow_broadcast_all_sources=allow_broadcast_all_sources, inputs=sources, common=common)
locals:
validate_broadcast_all_sources = <local> <function validate_broadcast_all_sources at 0x7f234c2e89d0>
allow_broadcast_all_sources = <local> <class 'returnn.util.basic.NotSpecified'>
inputs = <not found>
sources = <local> [Data{'x', [F|F'x'(12)]}, Data{'y', [F|F'y'(24)]}]
common = <local> Data{'add_output', [F'x'(12),F|F'y'(24)]}
File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/util/basic.py", line 2345, in validate_broadcast_all_sources
line: BehaviorVersion.require(version=4, condition=False, message=msg)
locals:
BehaviorVersion = <local> <class 'returnn.util.basic.BehaviorVersion'>
BehaviorVersion.require = <local> <bound method BehaviorVersion.require of <class 'returnn.util.basic.BehaviorVersion'>>
version = <not found>
condition = <not found>
message = <not found>
msg = <local> "All inputs\n - Data{'x', [F|F'x'(12)]}\n - Data{'y', [F|F'y'(24)]}\nrequire broadcasting to \n Data{'add_output', [F'x'(12),F|F'y'(24)]}.\nThis must be explicitly allowed, e.g. by specifying out_shape.", len = 197
File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/util/basic.py", line 276, in BehaviorVersion.require
line: raise BehaviorVersion.RequirementNotSatisfied(
"%s (required since behavior_version >= %i)" % (message, version))
locals:
BehaviorVersion = <global> <class 'returnn.util.basic.BehaviorVersion'>
BehaviorVersion.RequirementNotSatisfied = <global> <class 'returnn.util.basic.BehaviorVersion.RequirementNotSatisfied'>
message = <local> "All inputs\n - Data{'x', [F|F'x'(12)]}\n - Data{'y', [F|F'y'(24)]}\nrequire broadcasting to \n Data{'add_output', [F'x'(12),F|F'y'(24)]}.\nThis must be explicitly allowed, e.g. by specifying out_shape.", len = 197
version = <local> 4
RequirementNotSatisfied: All inputs
- Data{'x', [F|F'x'(12)]}
- Data{'y', [F|F'y'(24)]}
require broadcasting to
Data{'add_output', [F'x'(12),F|F'y'(24)]}.
This must be explicitly allowed, e.g. by specifying out_shape. (required since behavior_version >= 4)
Ah, this is again ReturnnConstructTemplateException
? At this point, out_shape
is not set yet, because it uses get_out_data_from_opts
to figure out what the out_shape
is actually supposed to be.
So it means, in any case, we explicitly should set allow_broadcast_all_sources
, either to False (default), or to True by the user.
(Btw, please write the tests just like the other test, and test it as part of the tests.)
So it means, in any case, we explicitly should set allow_broadcast_all_sources, either to False (default), or to True by the user.
I think setting it False on default and then making combine
public would be a good way to deal with it.
(Btw, please write the tests just like the other test, and test it as part of the tests.)
Okay yes I should get used to testing it via the test env, you are right. Is there something else that could be shortened / simplified or should be changed in this test / future tests?
I think setting it False on default and then making
combine
public would be a good way to deal with it.
I did that now. A bit further extended: The default is NotSpecified
, and in case it is not needed, it is omitted.
Is there something else that could be shortened / simplified or should be changed in this test / future tests?
Well, first of all, it was incomplete. PyCharm would also have told you that config, net_dict
were unused at the end.
But also, in this case, I shortened it further. We only really need the custom nn.Module
if the model has params.
And you should have nn.reset_default_root_name_ctx()
in the beginning. Just like in the other tests.
See test_op_broadcasting
in my commit: https://github.com/rwth-i6/returnn_common/commit/98c3fc387a08733627e4fbd3f29ea708def33e64
With two tensors x,y of shape [A] and [B] doing
x-y
with broadcasting would produce a vector of shape [A,B]. However in returnn, this behavior needs to be explicitly allowed through a flag since behavior version 4, which is not enabled in the standard operators. I can see that this might be intentional to avoid too easy mistakes, but what would be the suggested way to do the broadcasting here?I could import
_combine
myself, but since its not directly imported withnn
I hesitate because this does not seem right. Is there a better way?