rwth-i6 / returnn_common

Common building blocks for RETURNN configs, such as models, training concepts, etc
7 stars 4 forks source link

Broadcasting with standard operators #136

Closed Atticus1806 closed 2 years ago

Atticus1806 commented 2 years ago

With two tensors x,y of shape [A] and [B] doing x-y with broadcasting would produce a vector of shape [A,B]. However in returnn, this behavior needs to be explicitly allowed through a flag since behavior version 4, which is not enabled in the standard operators. I can see that this might be intentional to avoid too easy mistakes, but what would be the suggested way to do the broadcasting here?

I could import _combine myself, but since its not directly imported with nn I hesitate because this does not seem right. Is there a better way?

albertz commented 2 years ago

_combine, as the _ prefix indicates, is an internal function, so you should use it. So this is not a solution at all.

So, some random options right out of my head:

albertz commented 2 years ago

@Zettelkasten maybe some opinion?

albertz commented 2 years ago

I think we should make combine public. I just noticed that we have compare public, which is related.

albertz commented 2 years ago

Actually, I was thinking it wrongly. The current behavior in returnn-common is (like the old RETURNN behavior) that it always automatically has allow_broadcast_all_sources=True. This is the default RETURNN behavior when out_shape is set, which is always the case here.

So, we maybe should explicitly always set allow_broadcast_all_sources=False by default? And then in an explicit combine function, the user can explicitly set allow_broadcast_all_sources=True when needed?

Atticus1806 commented 2 years ago

Actually, I was thinking it wrongly. The current behavior in returnn-common is (like the old RETURNN behavior) that it always automatically has allow_broadcast_all_sources=True. This is the default RETURNN behavior when out_shape is set, which is always the case here.

Wouldn't that mean the example above a-b should work as it is right now? For me it crashes right now in a model I am testing, when either using the - op or not setting allow_broadcast_all_sources at all. Or am I missunderstanding something?

So, we maybe should explicitly always set allow_broadcast_all_sources=False by default? And then in an explicit combine function, the user can explicitly set allow_broadcast_all_sources=True when needed?

Yes I like that. I dont think broadcasting should work on default, because we want returnn_common to be explicit and this would introduce implicit behavior.

albertz commented 2 years ago

So can you construct a test case example again?

Atticus1806 commented 2 years ago
def test_op_broadcasting():
  # https://github.com/rwth-i6/returnn_common/issues/136
  class _Net(nn.Module):
    @nn.scoped
    def __call__(self, x: nn.Tensor, y: nn.Tensor):
      return x + y

  net = _Net()
  x_dim = nn.FeatureDim('x', 12)
  x = nn.get_extern_data(nn.Data('x', dim_tags=[x_dim]))
  y_dim = nn.FeatureDim('y', 24)
  y = nn.get_extern_data(nn.Data('y', dim_tags=[y_dim]))
  out = net(x, y)
  out.mark_as_default_output()
  config = nn.get_returnn_config().get_complete_py_code_str(net)
  from returnn_common.tests.returnn_helpers import dummy_run_net, config_net_dict_via_serialized
  config, net_dict = config_net_dict_via_serialized(config)

with error:

WARNING:tensorflow:From /u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/network.py:477: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
EXCEPTION
Traceback (most recent call last):
  File "/u/hilmes/experiments/tts_new_sis/recipe/returnn_common/nn/base.py", line 857, in _data_from_layer_dict.<locals>._add_layer
    line: out_data = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      out_data = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.CombineLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method CombineLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.CombineLayer'>>
      layer_desc = <local> {'kind': 'add', '_network': <TFNetwork 'dummy_net' train>, '_name': 'add', 'sources': [<InternalLayer 'data:x' out_type=Data{[F|F'x'(12)]}>, <InternalLayer 'data:y' out_type=Data{[F|F'
y'(24)]}>], 'name': 'add', 'network': <TFNetwork 'dummy_net' train>}, len = 6
  File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/layers/basic.py", line 7466, in CombineLayer.get_out_data_from_opts
    line: out_type_.update(
            Data.get_common_data(
              [s.output for s in sources], allow_broadcast_all_sources=allow_broadcast_all_sources,
              name="%s_output" % kwargs["name"]).get_kwargs())
    locals:
      out_type_ = <local> {}
      out_type_.update = <local> <built-in method update of dict object at 0x7f234b996680>
      Data = <global> <class 'returnn.tf.util.data.Data'>
      Data.get_common_data = <global> <bound method Data.get_common_data of <class 'returnn.tf.util.data.Data'>>
      s = <not found>
      s.output = <not found>
      sources = <local> [<InternalLayer 'data:x' out_type=Data{[F|F'x'(12)]}>, <InternalLayer 'data:y' out_type=Data{[F|F'y'(24)]}>]
      allow_broadcast_all_sources = <local> <class 'returnn.util.basic.NotSpecified'>
      name = <not found>
      kwargs = <local> {'kind': 'add', '_network': <TFNetwork 'dummy_net' train>, '_name': 'add', 'name': 'add'}
      get_kwargs = <not found>
  File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/util/data.py", line 5358, in Data.get_common_data
    line: validate_broadcast_all_sources(
            allow_broadcast_all_sources=allow_broadcast_all_sources, inputs=sources, common=common)
    locals:
      validate_broadcast_all_sources = <local> <function validate_broadcast_all_sources at 0x7f234c2e89d0>
      allow_broadcast_all_sources = <local> <class 'returnn.util.basic.NotSpecified'>
      inputs = <not found>
      sources = <local> [Data{'x', [F|F'x'(12)]}, Data{'y', [F|F'y'(24)]}]
      common = <local> Data{'add_output', [F'x'(12),F|F'y'(24)]}
  File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/tf/util/basic.py", line 2345, in validate_broadcast_all_sources
    line: BehaviorVersion.require(version=4, condition=False, message=msg)
    locals:
      BehaviorVersion = <local> <class 'returnn.util.basic.BehaviorVersion'>
      BehaviorVersion.require = <local> <bound method BehaviorVersion.require of <class 'returnn.util.basic.BehaviorVersion'>>
      version = <not found>
      condition = <not found>
      message = <not found>
      msg = <local> "All inputs\n - Data{'x', [F|F'x'(12)]}\n - Data{'y', [F|F'y'(24)]}\nrequire broadcasting to \n  Data{'add_output', [F'x'(12),F|F'y'(24)]}.\nThis must be explicitly allowed, e.g. by specifying out_shape.", len = 197
  File "/u/hilmes/experiments/tts_new_sis/recipe/returnn/util/basic.py", line 276, in BehaviorVersion.require
    line: raise BehaviorVersion.RequirementNotSatisfied(
            "%s (required since behavior_version >= %i)" % (message, version))
    locals:
      BehaviorVersion = <global> <class 'returnn.util.basic.BehaviorVersion'>
      BehaviorVersion.RequirementNotSatisfied = <global> <class 'returnn.util.basic.BehaviorVersion.RequirementNotSatisfied'>
      message = <local> "All inputs\n - Data{'x', [F|F'x'(12)]}\n - Data{'y', [F|F'y'(24)]}\nrequire broadcasting to \n  Data{'add_output', [F'x'(12),F|F'y'(24)]}.\nThis must be explicitly allowed, e.g. by specifying out_shape.", len = 197
      version = <local> 4
RequirementNotSatisfied: All inputs
 - Data{'x', [F|F'x'(12)]}
 - Data{'y', [F|F'y'(24)]}
require broadcasting to
  Data{'add_output', [F'x'(12),F|F'y'(24)]}.
This must be explicitly allowed, e.g. by specifying out_shape. (required since behavior_version >= 4)
albertz commented 2 years ago

Ah, this is again ReturnnConstructTemplateException? At this point, out_shape is not set yet, because it uses get_out_data_from_opts to figure out what the out_shape is actually supposed to be.

So it means, in any case, we explicitly should set allow_broadcast_all_sources, either to False (default), or to True by the user.

albertz commented 2 years ago

(Btw, please write the tests just like the other test, and test it as part of the tests.)

Atticus1806 commented 2 years ago

So it means, in any case, we explicitly should set allow_broadcast_all_sources, either to False (default), or to True by the user.

I think setting it False on default and then making combine public would be a good way to deal with it.

(Btw, please write the tests just like the other test, and test it as part of the tests.)

Okay yes I should get used to testing it via the test env, you are right. Is there something else that could be shortened / simplified or should be changed in this test / future tests?

albertz commented 2 years ago

I think setting it False on default and then making combine public would be a good way to deal with it.

I did that now. A bit further extended: The default is NotSpecified, and in case it is not needed, it is omitted.

Is there something else that could be shortened / simplified or should be changed in this test / future tests?

Well, first of all, it was incomplete. PyCharm would also have told you that config, net_dict were unused at the end.

But also, in this case, I shortened it further. We only really need the custom nn.Module if the model has params.

And you should have nn.reset_default_root_name_ctx() in the beginning. Just like in the other tests.

See test_op_broadcasting in my commit: https://github.com/rwth-i6/returnn_common/commit/98c3fc387a08733627e4fbd3f29ea708def33e64