Closed puhach closed 3 years ago
Though we are a little late to respond, we encourage such user-specific issues to be raised on the Knowledge hub, so that our experienced mentors can chime-in and new students can also learn from the discussion thread. Thank you
I am wondering why we need 2 embedding layers for negative sampling in word2vec algorithm. I wasn't explained. Using the same embedding layer for both positive and negative samples seems to produce comparable results.
Apart from that, the model seems to be prone to numerical overflows when calculating losses. This is slightly mitigated by uniform weight initialization, although I would suggest clipping the results of matrix multiplication.