Closed jeffchy closed 6 years ago
Simply because of the training speed, yes. I once implemented the 'correct sampling' https://github.com/theeluwin/pytorch-sgns/blob/2ba6c249867a0d5adc9e1c466f4bca09f0ef1a02/model.py#L62 but it was way too slow, while faster training yields more iterations. But I believe that the correct sampling is still the right way.
First, thanks for you excellent code :)
In model.py, the following piece of code suggests that we may get positive word when we do negative sampling, though the probability is very small.
nwords = t.multinomial(self.weights, batch_size * context_size * self.n_negs, replacement=True).view(batch_size, -1)
I'm wondering why you didn't perform equality check, is that because it doesn't affect the quality of trained word vectors but slow down the training speed? Are there other reasons?