Closed Sharp-rookie closed 1 year ago
The exponential part of the second term should start at 0, and 1000 should be changed to 10000.
tst/utils.py
def generate_original_PE(length: int, d_model: int) -> torch.Tensor: ... pos = torch.arange(length).unsqueeze(1) PE[:, 0::2] = torch.sin(pos / torch.pow(1000, torch.arange(0, d_model, 2, dtype=torch.float32)/d_model)) PE[:, 1::2] = torch.cos(pos / torch.pow(1000, torch.arange(1, d_model, 2, dtype=torch.float32)/d_model))
Hi, thanks for noticing, I've fixed the implementation. These positional encodings from the original paper where not used in practice in our experiments.
The exponential part of the second term should start at 0, and 1000 should be changed to 10000.
tst/utils.py