Missing the diversity evaluation?

LTlitong commented 4 years ago

Hello,

Thanks for your code! I rerun the training but only get PPL scores, and I have checked the codes , it looks like the diversity evaluations (distinct-1 & distinct-2) are missing ?

Thanks a lot for your early reply!

ehsk commented 4 years ago

Sorry for the late reply.

The evaluation metrics were not shipped into this repo. Please check out here for computing the metrics.

Also, you may find this paper interesting as they introduce another metric for diversity, called entropy, to account for the frequency difference of ngrams.

LTlitong commented 4 years ago

Thanks for your reply and metrics code you provided !

Could you please also share the source codes or some details of SS(Semantic Similarity) and REI(Response Echo Index) , which are most important evaluation in paper ？

Thanks a lot for your answer!

ehsk commented 4 years ago

We have provided the code for Semantic Similarity here.

Here is a rough implementation of REI (you need to take an average over the entire data):

import spacy
nlp = spacy('en')

def jaccard_similarity(s1, s2):
    # unknown word can match with every word
    unk_factor = 0
    if '<unk>' in s1 and '<unk>' in s2:
        pass
    elif '<unk>' in s1 or '<unk>' in s2:
        unk_factor = 1

    union = len(s1.union(s2)) - unk_factor
    if union != 0:
        return (len(s1.intersection(s2)) + unk_factor) / union
    else:
        return 0

def bow(text):
    doc = nlp(text)

    lemmas = set()
    for w in doc:
        if not w.is_stop and not w.is_punct:
            lemmas.add(w.lemma_)

    return lemmas

def REI(dialog_history, response):
  response_bow = bow(response)
  return sum(jaccard_similarity(bow(utterance), response_bow) for utterance in dialog_history)

nouhadziri / THRED

Missing the diversity evaluation? #24