Thank you for interesting work.
I have a very naive query, in the code above, when a tensor is assigned to another tensor, as they share the data, doesn't residual has same value as x when x is modified in the subsequent code ? because later in code we add the residual back to x to make a skip connection.
https://github.com/sacmehta/delight/blob/cc499c53087cd248ee7a0d0b0e70c507e670cba3/fairseq/modules/delight_transformer_layer.py#L120
Thank you for interesting work. I have a very naive query, in the code above, when a tensor is assigned to another tensor, as they share the data, doesn't
residual
has same value asx
whenx
is modified in the subsequent code ? because later in code we add theresidual
back tox
to make a skip connection.