Memory was allocated without space for biases. As the result of this some gradients were overwritten by weights, and some costs were overwritten by gradients.
You can see below that word vector positions inside W are calculated with additional space for bias (+ 1), for example:
/* Get location of words in W & gradsq */
l1 = (cr.word1 - 1LL) * (vector_size + 1); // cr word indices start at 1
l2 = ((cr.word2 - 1LL) + vocab_size) * (vector_size + 1); // shift by vocab_size to get separate vectors for context words
Thus the memory allocation must also use additional space.
With this fix applied the algorithm tends to perform slightly better on provided analogy tests.
@moefly good catch. I knew merging this couldn't hurt, and people have been getting segfaults that I cannot reproduce. But it does indeed look like it's unnecessary.
Memory was allocated without space for biases. As the result of this some gradients were overwritten by weights, and some costs were overwritten by gradients.
You can see below that word vector positions inside W are calculated with additional space for bias (
+ 1
), for example:Thus the memory allocation must also use additional space.
With this fix applied the algorithm tends to perform slightly better on provided analogy tests.