stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Add the --maxlen option to sticker {tag, server} #164

Closed danieldk closed 4 years ago

danieldk commented 4 years ago

As in training, this discards sentences that are longer than the value given for --maxlen.

danieldk commented 4 years ago

Backstory: preparation of pretraining data for Dutch (apparently) failed, because some sentence contains the word SPAM ~6900 times, leading to 256 * 6900 overall time steps. Well, at least the sentence was accurate ;).