Closed xvdp closed 2 years ago
@xvdp Are you able to run the rest of it? I am getting two errors which I am not able to resolve:
.
.
.
<ipython-input-139-2ae4ba63671c> in encode(self, src, src_mask)
17
18 def encode(self, src, src_mask):
---> 19 return self.encoder(self.src_embed(src), src_mask)
20
21 def decode(self, memory, src_mask, tgt, tgt_mask):
.
.
.
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
.
.
.
74 def encode(self, src, src_mask):
---> 75 return self.encoder(self.src_embed(src), src_mask)
Both has self.encoder() as fault. I can't really figure out what is happening. It will super great if you can provide some insight on this. Link to that jupyter notebook:
https://github.com/AmoghM/DeepLearning/blob/master/TransformerNetwork/HarvardTransformer.ipynb
@xvdp Solved
"exp" not implemented for 'torch.LongTensor' pytorch 1.0
by the original version:
div_term = 1 / (10000 ** (torch.arange(0., d_model, 2) / d_model))
I am not sure though what is the main reason why they decided to use .exp
but my best guess is the numerical stability.
Glad you solved it, sorry - i hadn't seen your message. There are are a couple other things that need to be fixed so this runs on pytorch 1.0 +. As in the access to scalars using .item() instead of data()[0] but curiously I did not run into your problem. Ill note it here.
@AmoghM
out = greedy_decode (model, src.cuda(), src_mask.cuda(), max_len=60, start_symbol=TGT.vocab.stoi["<s>"])
try to add .cuda()
at the end of src and src_mask, this will move src and src_mask to gpus
I find the answer in the below link
https://github.com/huggingface/pytorch-pretrained-BERT/issues/227
@V-Enzo Thanks for pointing at this. I will try to do it and report back.
to run on pytorch 1.0 both the position and the div term need to be initialized as float instead of int in the PositionalEncoding Class
ie.
in lieu of
torch.arange(0, ...
thank you for a great breakdown of Vaswani's paper