rwth-i6 / returnn_common

Common building blocks for RETURNN configs, such as models, training concepts, etc
7 stars 4 forks source link

remove lazy init logic, Linear, Conv, Norm API changes #215

Closed albertz closed 2 years ago

albertz commented 2 years ago

Fix #212. Fix #109.

And a bunch of semi related other fixes.

This will be rebased and merged, not squashed.

albertz commented 2 years ago

Still WIP but most of it looks ok already.

Most failing tests are because the naming logic changed slightly. But I think it's not optimal yet, and what the tests are checking is right, i.e. in those cases we want the old behavior. I will see how to get that.

albertz commented 2 years ago

I just realized, the deepcopy logic is even more problematic, as it is implemented currently here. Also see #109 for discussion on that.

Now, I think nn.Parameter will not correctly be copied. It will create a copy of nn.Tensor and nn.Parameter, however, inside the object, the name_ctx will not be copied (due to new NameCtx.__deepcopy__). This leads to unexpected und buggy behavior.

How to solve this? Again custom __copy__ and __deepcopy__ also for nn.Tensor and nn.Parameter? As mentioned before, nn.Tensor is immutable, so it should not do anything. But nn.Parameter must get a new copy.

Or change NameCtx.__deepcopy__? But this is probably even more difficult, and we anyway need the special logic for the tensors.

Or Module.__deepcopy__? But how? But also, it would be unexpected for the user when copying non-module objects (consisting of tensors or so) behave different. So this is also not really an option.