How to ensure that the negative sampled words are not the target word?

theeluwin / pytorch-sgns

Skipgram Negative Sampling implemented in PyTorch

MIT License

302 stars 59 forks source link

How to ensure that the negative sampled words are not the target word? #9

Closed jeffchy closed 6 years ago

jeffchy commented 6 years ago

First, thanks for you excellent code :)

In model.py, the following piece of code suggests that we may get positive word when we do negative sampling, though the probability is very small. nwords = t.multinomial(self.weights, batch_size * context_size * self.n_negs, replacement=True).view(batch_size, -1) I'm wondering why you didn't perform equality check, is that because it doesn't affect the quality of trained word vectors but slow down the training speed? Are there other reasons?

theeluwin commented 6 years ago

Simply because of the training speed, yes. I once implemented the 'correct sampling' https://github.com/theeluwin/pytorch-sgns/blob/2ba6c249867a0d5adc9e1c466f4bca09f0ef1a02/model.py#L62 but it was way too slow, while faster training yields more iterations. But I believe that the correct sampling is still the right way.