sacmehta / delight

DeLighT: Very Deep and Light-Weight Transformers
MIT License
467 stars 53 forks source link

Naive question about residual connection #9

Open gopi-erabati opened 3 years ago

gopi-erabati commented 3 years ago

https://github.com/sacmehta/delight/blob/cc499c53087cd248ee7a0d0b0e70c507e670cba3/fairseq/modules/delight_transformer_layer.py#L120

Thank you for interesting work. I have a very naive query, in the code above, when a tensor is assigned to another tensor, as they share the data, doesn't residual has same value as x when x is modified in the subsequent code ? because later in code we add the residual back to x to make a skip connection.

wizardk commented 3 years ago

You can try this:

import torch
from torch.nn import functional as F

x = torch.randn(1,2)
print(x)
y = x
y = F.relu(y)
print(y)