rwth-i6 / i6_models

Collection of NN-Model parts
Mozilla Public License 2.0
1 stars 0 forks source link

Make norm Calleable #23

Closed Atticus1806 closed 1 year ago

Atticus1806 commented 1 year ago

As agreed on today.

albertz commented 1 year ago

Sorry, I missed the meeting. What was the reasoning on this? This changes the code style away from standard PyTorch code style. Why do you want that? I think this is a bad idea.

albertz commented 1 year ago

Also, you now don't allow a callable function anymore, like things from torch.nn.functional. Why would you want that change?

Atticus1806 commented 1 year ago

It caused parameters to be shared on all places where layernorm was used.

SimBe195 commented 1 year ago

Sorry, I missed the meeting. What was the reasoning on this?

Expecting a nn.Module can lead to bugs because the Module instance would only be created once when setting up the config and then passed to each layer that uses it instead of each layer having an independent instance of the module.

albertz commented 1 year ago

It caused parameters to be shared on all places where layernorm was used.

Yes people usually do a deepcopy somewhere to avoid this. E.g. see the official PyTorch Transformer implementation.

JackTemaki commented 1 year ago

This changes the code style away from standard PyTorch code style. Why do you want that? I think this is a bad idea.

What is standard PyTorch code style? As I understand standard would mean you just define the modules as is. We decided against that for modularity, but now we saw that how we did it is not a good approach. You mention the PyTorch Transformer, but there LayerNorm is hard-coded!!!

https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/transformer.py#L503

Only the final norm of the encoder/decoder is configurable, but this appears only once. For the Layers itself you pass layer_norm_eps as argument, so this is clearly something we decided against.

albertz commented 1 year ago

I'm not speaking about the norm in PyTorch Transformer, I speak about the generic code style convention to pass an instance of a nn.Module around, and not such a factory function. This is what you see for example in PyTorch Transformer, in TransformerEncoder the encoder_layer is passed as-is, and then it makes copies of that using deepcopy such that you don't have shared parameters. I actually also explained that in the last meeting when we decided to use this code style, that you need to do a deepcopy somewhere. I guess this was forgotten.

This generic code style convention to pass an instance of a nn.Module around, you see that quite often in PyTorch code. Although I was checking a bit how this problem is avoided, and I found that either you have some deepcopy somewhere, or at the layer level people make sure that you create new instances and not use the shared instance.

JackTemaki commented 1 year ago

Ah your example was meant for layer copy. Well, as you can pass any callable through deepcopy I guess we can actually do that (for the norm). If any nn.Module is generally safely deepcopy-able that is.

Atticus1806 commented 1 year ago

So we discuss this on Thursday?