Open jinzishuai opened 6 years ago
https://en.wikipedia.org/wiki/Word2vec
ref: https://en.wikipedia.org/wiki/N-gram#Skip-gram
A k-skip-n-gram
is a length-n subsequence where the components occur at distance at most k from each other.
It has two parameters
skip_window
: How many words to consider left and right. So the windows size is span = 2 * skip_window + 1
, which is the n
in k-skip-n-gram
num_skips
: How many times to reuse an input to generate a label, which is the k
in k-skip-n-gram
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
They are equivalent. But with embedding lookup, we don't have a construct a matrix of "one-hot" columns from the train_dataset
. We can simply look it up since it just returns a row in that particular location where it is 1 in the "one-hot" vector.
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))
This is the assignment of https://github.com/jinzishuai/learn2deeplearn/tree/master/google_dl_udacity/lesson5
Word2Vec has two algorithms:
But first, what is word2vec? what does it try to achieve?