harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.
http://nlp.seas.harvard.edu/annotated-transformer
MIT License
5.67k stars 1.23k forks source link

"exp" not implemented for 'torch.LongTensor' pytorch 1.0 #25

Closed xvdp closed 2 years ago

xvdp commented 5 years ago

to run on pytorch 1.0 both the position and the div term need to be initialized as float instead of int in the PositionalEncoding Class

ie.

position = torch.arange(0.0, max_len).unsqueeze(1)
div_term = torch.exp(torch.arange(0.0, d_model, 2) *
                  -(math.log(10000.0) / d_model))

in lieu of torch.arange(0, ...

thank you for a great breakdown of Vaswani's paper

AmoghM commented 5 years ago

@xvdp Are you able to run the rest of it? I am getting two errors which I am not able to resolve:

  1. RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
    .
    .
    .
    <ipython-input-139-2ae4ba63671c> in encode(self, src, src_mask)
     17 
     18     def encode(self, src, src_mask):
    ---> 19         return self.encoder(self.src_embed(src), src_mask)
     20 
     21     def decode(self, memory, src_mask, tgt, tgt_mask):
    .
    .
    .
    RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
  2. NotImplementedError
    
    .
    .
    .
    74     def encode(self, src, src_mask):
    ---> 75         return self.encoder(self.src_embed(src), src_mask)

Both has self.encoder() as fault. I can't really figure out what is happening. It will super great if you can provide some insight on this. Link to that jupyter notebook:
https://github.com/AmoghM/DeepLearning/blob/master/TransformerNetwork/HarvardTransformer.ipynb
v-iashin commented 5 years ago

@xvdp Solved

"exp" not implemented for 'torch.LongTensor' pytorch 1.0

by the original version:

div_term = 1 / (10000 ** (torch.arange(0., d_model, 2) / d_model)) 

I am not sure though what is the main reason why they decided to use .exp but my best guess is the numerical stability.

xvdp commented 5 years ago

Glad you solved it, sorry - i hadn't seen your message. There are are a couple other things that need to be fixed so this runs on pytorch 1.0 +. As in the access to scalars using .item() instead of data()[0] but curiously I did not run into your problem. Ill note it here.

V-Enzo commented 5 years ago

@AmoghM out = greedy_decode (model, src.cuda(), src_mask.cuda(), max_len=60, start_symbol=TGT.vocab.stoi["<s>"]) try to add .cuda() at the end of src and src_mask, this will move src and src_mask to gpus I find the answer in the below link https://github.com/huggingface/pytorch-pretrained-BERT/issues/227

AmoghM commented 5 years ago

@V-Enzo Thanks for pointing at this. I will try to do it and report back.