Closed JackTemaki closed 1 month ago
Follow-up for #55
When using the Transformer-XL style (learnable_pos_emb=False) the device for the sinusoidal embedding was not correctly set, causing a device mismatch when training on GPU.
learnable_pos_emb=False
Follow-up for #55
When using the Transformer-XL style (
learnable_pos_emb=False
) the device for the sinusoidal embedding was not correctly set, causing a device mismatch when training on GPU.