Open soso010816 opened 4 months ago
Maybe just modifying this is right?
`
class PositionalEncoding(nn.Module):
def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 2048):
super().__init__()
self.dropout = nn.Dropout(p=dropout)
position = torch.arange(max_len).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
# pe = torch.zeros(max_len, 1, d_model)
pe = torch.zeros(max_len, d_model)
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
self.register_buffer('pe', pe)
def forward(self, x: Tensor) -> Tensor:
"""
Args:
x: Tensor, shape [batch_size, seq_len, embedding_dim]
"""
x = x + self.pe[:x.size(1)].unsqueeze(0)
return self.dropout(x)
`
hello, I am very respect for your work and your idea. But when I tested your code and debuged it, I discovered that some potential issues may arise in your code. I decide to feed it back to great authors.
the "transformer.py" requires that the shape of input tokenized is [seq_len, batch_size], like this:
But when I checked your finetuning code, I found the exact size of input is torch.size[batch_size, seq_len], which leads the
self.pos_encoder(src)
to be wrong, due to in the addition of two tensors of different dimensions,therefore the 'x' ofx = x + self.pe[:x.size(0)]
in yourpos_encoder
would be [batch_size, seq_len, embedding_dim] instead of [seq_len, batch_size, embedding_dim]. like this:I think it could not result output of transformer as we expect, even lead unexpected result in other downstream model? These also happened in "test_transfomer.py" when input is just one sample with torch.size([seq_len]), but it can be solved by
input=input.unsqueeze(1)
.I believe this is just a joke the great authors made when open sourcing the code and should have no impact on your pre-training process.
Sincerely hope to get your reply!!!