For the wordlist-based features, experiment with pruned word lists:
Order word lists according to a strength indicator, examples below, and experiment with different top-n lists, 0 < n < N.
distance from centre of cluster
predictive power of of the word on its own in the training data
length of words in characters, breaking ties randomly
New hyper-parameter(s): To set n, we can either
use a single hyper-parameter r, 0 <= r <= 1, that sets n = rN for each word list (each word list may have a different N depending on how the word list was created), or
a new hyper-parameter for each word list.
This can also be combined using two global hyper-parameters r1 and r2, 0 <= r1, r2 <= 1, and setting n = (min(r1,r2) + rabs(r1-r2))N, where r is the word-list-specific hyper-parameter.
Different strength indicators can also be linearly mixed. The mixing weights are hyper-parameters.
For the wordlist-based features, experiment with pruned word lists: Order word lists according to a strength indicator, examples below, and experiment with different top-n lists, 0 < n < N.
New hyper-parameter(s): To set n, we can either
Different strength indicators can also be linearly mixed. The mixing weights are hyper-parameters.