jinzishuai / learn2deeplearn

A repository of codes in learning deep learning

GNU General Public License v3.0

13 stars 1 forks source link

Word2Vec #49

Open jinzishuai opened 6 years ago

jinzishuai commented 6 years ago

This is the assignment of https://github.com/jinzishuai/learn2deeplearn/tree/master/google_dl_udacity/lesson5

deeplearning.ai video: https://github.com/jinzishuai/learn2deeplearn/blob/master/deeplearning.ai/C5.SequenceModel/Week2_NLP_WordEmbeddings/6.Word2Vec.mp4

Word2Vec has two algorithms:

Skip-Gram
CBOW (Continuous Bag of Words)

But first, what is word2vec? what does it try to achieve?

jinzishuai commented 6 years ago

word2vec

https://en.wikipedia.org/wiki/Word2vec

Word2vec is a group of related models that are used to produce word embeddings ( where words or phrases from the vocabulary are mapped to vectors of real numbers:). Example is to translate a 10,000 dimensional "one-hot" vector to a 300 dimensional real number vector.
Input: words
Output: vectors in a high dimension space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

jinzishuai commented 6 years ago

The Skip Gram Algorithm

ref: https://en.wikipedia.org/wiki/N-gram#Skip-gram

A k-skip-n-gram is a length-n subsequence where the components occur at distance at most k from each other. It has two parameters

skip_window: How many words to consider left and right. So the windows size is span = 2 * skip_window + 1, which is the n in k-skip-n-gram
num_skips: How many times to reuse an input to generate a label, which is the k in k-skip-n-gram

jinzishuai commented 6 years ago

https://towardsdatascience.com/word2vec-skip-gram-model-part-1-intuition-78614e4d6e0b

Embedding Lookup

embed = tf.nn.embedding_lookup(embeddings, train_dataset)

How is embedding lookup different from a normal MatMul?

They are equivalent. But with embedding lookup, we don't have a construct a matrix of "one-hot" columns from the train_dataset. We can simply look it up since it just returns a row in that particular location where it is 1 in the "one-hot" vector.

Note that We use Nagative Sampling

 loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))