noseworm / convai

4 stars 1 forks source link

[Fix HRED] Avoid Repetitions #3

Closed NicolasAG closed 7 years ago

NicolasAG commented 7 years ago

Let's add a little hack to the HRED model to avoid repeating itself in the same sentence

NicolasAG commented 7 years ago

To do so, add the following code in search.py of the hred package we are loading in the docker instance:

Import nltk ngrams: from nltk.util import ngrams

Then, right after these lines:

# Adjust log probs according to search restrictions
if ignore_unk:
    next_probs[:, self.model.state['unk_sym']] = 0
if k <= min_length:
    next_probs[:, self.model.state['eos_sym']] = 0
    next_probs[:, self.model.state['eos_sym']] = 0

Add the following code:

# avoid repeating the same trigram in one sentence
for b_idx in xrange(len(gen)):  # for each n_sample compute the current set of trigrams and avoid it
    trigrams = set(ngrams(gen[b_idx], 3))
    if len(trigrams) > 0:
        for w in xrange(self.model.state['idim']):  # for each word in vocabulary, avoid trigram
            if tuple(gen[b_idx][-2:]+[w]) in trigrams:
                next_probs[b_idx, w] = 0

just before these lines:

# Update costs 
next_costs = np.array(costs)[:, None] - np.log(next_probs)
koustuvsinha commented 7 years ago

added the fix, will monitor if it persists