rwth-i6 / returnn-experiments

experiments with RETURNN
152 stars 43 forks source link

Assert Error when running 2022-lsh-attention #79

Closed zihanlalala closed 1 year ago

zihanlalala commented 1 year ago

Hi! I am running translation task with the provided script of 2022-lsh-attention, but get an error. Here is a brief error message.

File "/opt/tiger/aaa/returnn/returnn/tf/util/basic.py", line 109, in set_param_axes_split_info
    line: check_param_axes_split_info(param.get_shape().as_list(), axes_split_info)
    locals:
      check_param_axes_split_info = <global> <function check_param_axes_split_info at 0x7f91258f9598>
      param = <local> <tf.Variable 'source_embed_raw/W:0' shape=(42235, 512) dtype=float32>
      param.get_shape = <local> <bound method Variable.get_shape of <tf.Variable 'source_embed_raw/W:0' shape=(42235, 512) dtype=float32>>
      as_list = <not found>
      axes_split_info = <local> [[32881], [512]]
  File "/opt/tiger/aaa/returnn/returnn/tf/util/basic.py", line 121, in check_param_axes_split_info
    line: assert param_shape[i] == sum(parts)
    locals:
      param_shape = <local> [42235, 512]
      i = <local> 0
      sum = <builtin> <built-in function sum>
      parts = <local> [32881]

It seems that the code tries to assert 42235 == 32881, where 42235 is the source vocab size and 32881 is the target vocab size. I think I might set the wrong config but have no idea what is wrong. Here is my config about vocab.

# set your vocab sizes (data = source, classes = target) and dataset size here
num_outputs = {'data': [42235, 1], 'classes': [32881, 1]}

I wonder is there anything wrong with my config? I have followed the setup of 2022-lsh-attention and only replaced the dataset and vocab setting in the provided script.

Thank you.

albertz commented 1 year ago

@Zettelkasten maybe can help.

albertz commented 1 year ago

Can you show your modified config? Maybe upload it to a Gist and link here.

zihanlalala commented 1 year ago

Yes, here is the full config: https://gist.github.com/zihanlalala/9883b3efaad0739bcf58ffd049e6e7f2

albertz commented 1 year ago

I see that you use reuse_params and you try to share params between source embed, target embed and target output. This only works when you have the same source and target vocab. Remove those reuse_params usages.

zihanlalala commented 1 year ago

Thank you! It's the problem. By the way, do returnn supports tensorflow2.0 and above?

albertz commented 1 year ago

By the way, do returnn supports tensorflow2.0 and above?

Yes, it should be no problem.

zihanlalala commented 1 year ago

Thanks.