nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"
https://arxiv.org/abs/1811.01063
MIT License
111 stars 25 forks source link

Missing the diversity evaluation? #24

Closed LTlitong closed 4 years ago

LTlitong commented 4 years ago

Hello,

Thanks for your code! I rerun the training but only get PPL scores, and I have checked the codes , it looks like the diversity evaluations (distinct-1 & distinct-2) are missing ?

Thanks a lot for your early reply!

ehsk commented 4 years ago

Sorry for the late reply.

The evaluation metrics were not shipped into this repo. Please check out here for computing the metrics.

Also, you may find this paper interesting as they introduce another metric for diversity, called entropy, to account for the frequency difference of ngrams.

LTlitong commented 4 years ago

Thanks for your reply and metrics code you provided !

Could you please also share the source codes or some details of SS(Semantic Similarity) and REI(Response Echo Index) , which are most important evaluation in paper ?

Thanks a lot for your answer!

ehsk commented 4 years ago

We have provided the code for Semantic Similarity here.

Here is a rough implementation of REI (you need to take an average over the entire data):

import spacy
nlp = spacy('en')

def jaccard_similarity(s1, s2):
    # unknown word can match with every word
    unk_factor = 0
    if '<unk>' in s1 and '<unk>' in s2:
        pass
    elif '<unk>' in s1 or '<unk>' in s2:
        unk_factor = 1

    union = len(s1.union(s2)) - unk_factor
    if union != 0:
        return (len(s1.intersection(s2)) + unk_factor) / union
    else:
        return 0

def bow(text):
    doc = nlp(text)

    lemmas = set()
    for w in doc:
        if not w.is_stop and not w.is_punct:
            lemmas.add(w.lemma_)

    return lemmas

def REI(dialog_history, response):
  response_bow = bow(response)
  return sum(jaccard_similarity(bow(utterance), response_bow) for utterance in dialog_history)