Closed NicolasAG closed 7 years ago
To do so, add the following code in search.py of the hred package we are loading in the docker instance:
Import nltk ngrams:
from nltk.util import ngrams
Then, right after these lines:
# Adjust log probs according to search restrictions
if ignore_unk:
next_probs[:, self.model.state['unk_sym']] = 0
if k <= min_length:
next_probs[:, self.model.state['eos_sym']] = 0
next_probs[:, self.model.state['eos_sym']] = 0
Add the following code:
# avoid repeating the same trigram in one sentence
for b_idx in xrange(len(gen)): # for each n_sample compute the current set of trigrams and avoid it
trigrams = set(ngrams(gen[b_idx], 3))
if len(trigrams) > 0:
for w in xrange(self.model.state['idim']): # for each word in vocabulary, avoid trigram
if tuple(gen[b_idx][-2:]+[w]) in trigrams:
next_probs[b_idx, w] = 0
just before these lines:
# Update costs
next_costs = np.array(costs)[:, None] - np.log(next_probs)
added the fix, will monitor if it persists
Let's add a little hack to the HRED model to avoid repeating itself in the same sentence