stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Add shuffling adapter to SentenceIter #159

Closed twuebi closed 4 years ago

twuebi commented 4 years ago

This adds a buffered shuffling adapter to SentenceIter. The adapter first fills a buffer with items. After the buffer is filled, incoming items are swapped with a random element from the buffer. The random element is the next item. The buffer size controls the locality of the shuffling. If buffer_size is larger than the number of sentences in the dataset this is equivalent to uniform sampling over the full data set.