When doing negative sampling, the indices should be sampled from outside of current window, by definition.
However, in tf.keras.preprocessing.sequence.skipgrams, when sampling [center word index, context word index], context word index is sampled from whole range of index, including corresponding within-window context word indices. (line 225)
When doing negative sampling, the indices should be sampled from outside of current window, by definition.
However, in
tf.keras.preprocessing.sequence.skipgrams
, when sampling [center word index, context word index], context word index is sampled from whole range of index, including corresponding within-window context word indices. (line 225)https://github.com/keras-team/keras-preprocessing/blob/4538765fd369def80f81ad977bcf8e40e58c2f82/keras_preprocessing/sequence.py#L219-L230
As a result, positive couples of [center word index, within-window context word index] might have two opposing label (0: negative, 1: positive).
I could verify this issue with following simple code.