Closed hoosierEE closed 4 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Also, I just reviewed the documentation for tf.random.log_uniform_candidate_sampler
. It explicitly states that it does not reject any accidental positive hits, and then links to the Candidate Sampling Algorithms Reference. In that reference, the "Negative Sampling" row says that it considers negative training classes to be the full set S_i
, which does include positive samples. This is opposed to "Sampled Logistic", which considers the set (S_i - T_i)
. So it may be intentional that there could be accidental hits.
Your feedback was very helpful, thanks! Building a set from the positive_skip_grams
ended up being much faster. On average I see around 10% of negative samples discarded because they overlap with the positive context, resulting in about 1 percentage point improvement in training accuracy at 20 epochs.
Addressing issue #1228