Closed vieting closed 2 years ago
Furthermore, https://github.com/rwth-i6/returnn/commit/dbaa97a9d9cfd90d073d743786b6a233cdb5eeeb introduces an issue which I have not yet fixed. When unflattening the batch dim axis, all resulting axes now get a batch dim dim_tag
, which was previously not the case.
When unflattening the batch dim axis, all resulting axes now get a batch dim dim_tag, which was previously not the case.
@albertz, is this desired? I think there is not much we can do about this from pytorch-to-returnn side.
I think it's easier to have a separate issue here to discuss the problem. A PR should only be about the fix.
Despite an issue here, can you also make maybe an issue on RETURNN side with a simple net dict which fails now but which should work? Or even multiple issues when there seem to be multiple problems.
You created the issue about SplitDimsLayer
(https://github.com/rwth-i6/returnn/issues/908).
But didn't you also mention a separate issue about reuse_params
?
But didn't you also mention a separate issue about reuse_params?
Yes, the change in this PR should fix that.
It looks like there is another problem with batch norm:
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/layers/base.py", line 1551, in LayerBase.batch_norm.<locals>._calc_batch_norm
line: updated_sample_mean = tf_compat.v1.assign_add(sample_mean, (mean_cur_batch - sample_mean) * momentum)
locals:
updated_sample_mean = <not found>
tf_compat = <global> <module 'returnn.tf.compat' from '/home/runner/.local/lib/python3.8/site-packages/returnn/tf/compat.py'>
tf_compat.v1 = <global> <module 'tensorflow._api.v2.compat.v1' from '/home/runner/.local/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/__init__.py'>
tf_compat.v1.assign_add = <global> <function assign_add at 0x7f5bb980f550>
sample_mean = <local> <tf.Variable 'BatchNorm1d/batch_norm/mean:0' shape=(1, 11, 1) dtype=float32>
mean_cur_batch = <local> <tf.Tensor 'BatchNorm1d/batch_norm/moments/Squeeze:0' shape=(11,) dtype=float32>
momentum = <local> 0.1
Merged now. Although I did not really understand why this worked before and now not anymore. But I did not really look into it.
Ah, I think the problem is that I don't put the params into the layer.params
dict anymore when they are shared and originate in another layer. So yes, then this fix should be fine.
I think the problem is that I don't put the params into the layer.params dict anymore when they are shared and originate in another layer.
Yes exactly, this is also my understanding.
https://github.com/rwth-i6/returnn/commit/2a1b840319b59f890958ccc8cad9639e3fea5efd introduces an issue for pytorch-to-returnn for modules which use a layer with
reuse_params=True
.