negative sampling excludes positive class

hoosierEE commented 8 months ago

Addressing issue #1228

review-notebook-app[bot] commented 8 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

cantonios commented 8 months ago

Also, I just reviewed the documentation for tf.random.log_uniform_candidate_sampler. It explicitly states that it does not reject any accidental positive hits, and then links to the Candidate Sampling Algorithms Reference. In that reference, the "Negative Sampling" row says that it considers negative training classes to be the full set S_i, which does include positive samples. This is opposed to "Sampled Logistic", which considers the set (S_i - T_i). So it may be intentional that there could be accidental hits.

hoosierEE commented 8 months ago

Your feedback was very helpful, thanks! Building a set from the positive_skip_grams ended up being much faster. On average I see around 10% of negative samples discarded because they overlap with the positive context, resulting in about 1 percentage point improvement in training accuracy at 20 epochs.

tensorflow / text

negative sampling excludes positive class #1229