Open hoosierEE opened 1 year ago
Agreed, do you want to adjust the code and create a PR to exclude all context words for the target word?
I'll give it a try and let you know with a PR.
I don't usually work with notebooks so please excuse the noisy diff. It looks like there was a bunch of HTML escaping in the original that wasn't present in the .ipynb downloaded from colab.
I saw an improvement in accuracy for the same number of epochs (92% versus 89%) but generate_training_data
runs more slowly (about 2m versus <1m on colab). This is the important part of the diff:
+ # Generate positive context windows for each target word in the sequence.
+ window = defaultdict(list)
+ for i in range(window_size, len(sequence)-window_size):
+ window[sequence[i]].append(sequence[i-window_size:1+i+window_size])
# Iterate over each positive skip-gram pair to produce training examples
# with a positive context word and negative samples.
for target_word, context_word in positive_skip_grams:
context_class = tf.expand_dims(
tf.constant([context_word], dtype="int64"), 1)
negative_sampling_candidates, _, _ = tf.random.log_uniform_candidate_sampler(
true_classes=context_class,
num_true=1,
num_sampled=num_ns,
unique=True,
range_max=vocab_size,
seed=seed,
name="negative_sampling")
+ # Discard iteration if negative samples overlap with positive context.
+ for target in window[target_word]:
+ if not any(t in target for t in negative_sampling_candidates):
+ break # All candidates are true negatives: use this skip_gram.
+ else:
+ continue # Discard this skip_gram
No changes to the diagrams, and I left the prose unchanged except for a small correction to the Negative sampling for one skip-gram
section:
- You can call the function on one skip-grams's target word and pass the context word as true class to exclude it from being sampled.
+ You can pass words from the positive class but this does not exclude them from the results. For large vocabularies, this is not a problem because the chance of drawing one of the positive classes is small. However for small data you may see overlap between negative and positive samples. Later we will add code to exclude positive samples for slightly improved accuracy at the cost of longer runtime.
Hi, thanks for reporting this. I just wanted to add that this still seems to be an issue in the tensorflow docs.
The word2vec tutorial at first gives one definition of negative sampling:
However, the implementation uses a second definition:
There are several places where this second definition is used. First in the "small" example:
It's used again in the Summary diagram, and later in the definition for
generate_training_data
:With a large enough sequence, random sampling is unlikely to pick samples near
target_word
purely by chance, and as a result the model "works". However if you test with a small example, you can see that this form of sampling excludes only thecontext_word
.My understanding is that for a context window of
[the wide road shimmered]
with the target wordroad
, the positive (+) and negative (-) examples should be like this:Positive samples for
road
come from[the, wide, shimmered]
and negative samples for the context wordshimmered
come from[in, the, hot, sun]
.Either the text's definition of negative sampling should be changed, or the code should be changed to discard positive samples from the
neg_sampling_candidates
.