pytorch / tutorials

PyTorch tutorials.
https://pytorch.org/tutorials/
BSD 3-Clause "New" or "Revised" License
8.09k stars 4.02k forks source link

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec #1075

Open Liranbz opened 4 years ago

Liranbz commented 4 years ago

Hi, Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def get_word2vec(self):
        word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
        return word2vec

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.get_word2vec[word]
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

the dimension size of this word2vec is 300 dimensions Is I need to change other things in my Encoder?

Thank you!

NarenInD commented 4 years ago

Yeah I'm trying to train with word2vec. Word2vec can be either 100d, 200d, 300d vector i.e 1d array with 100 values for each word for 100d model

Can anyone help me where should I change the dimension values. for eg: what values should be replaced in below lines: self.embedding(input).view(1, 1, -1) return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)

@Liranbz Did you get sorted out

ivrschool commented 2 years ago

@NarenInD @Liranbz have you found the solution? I have been also looking for the same. Thank you.

QasimKhan5x commented 1 year ago

torchtext currently supports pretrained GloVe, FastText, and CharNGram embeddings. Other embeddings can be loaded using torchtext.vocab.Vectors. If anyone is interested, I can edit the tutorial to show how you could use those.