openai / deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
https://arxiv.org/abs/1802.01021
Other
649 stars 146 forks source link

What scores does Table 1 use on the paper? #26

Open ghost opened 6 years ago

ghost commented 6 years ago

On the paper, Table 1 (c) shows the entity linking scores, but how to solve them especially CoNLL scores?

(c) Entity Linking model Comparison. 
CoNLL
Link Count only: 68.614
manual (oracle): 98.217

For example, some mentions and its candidate entities are there.

doc_id, mention, candidate entity, label
-------------------------------------
1, apple, Apple Pie, True
1, apple, Apple (company), False
1, apple, Apple (fruits), False
...

If it predicts one entity that has the highest score of each mentions, I don't need to use false candidates to solve accuracy, but I don't know the Table 1 used false candidates or not.

How did you solve the Table 1 (c) scores?

Paper: https://arxiv.org/pdf/1802.01021.pdf

JonathanRaiman commented 6 years ago

If I understand correctly the data you are referring to also provides a "proposal set" of entities for each mention, and marks one of the proposed entities as correct, while others are incorrect? Table 1 measured for all mentions given in the CoNLL eval set the accuracy at recovering the true entity under the proposal set of any entity in Wikipedia/Wikidata (e.g. not just those proposed by CoNLL).

ghost commented 6 years ago

Well, I just used AIDA CoNLL-YAGO dataset, and prepared it for solving accuracy like this:

out = []
with open('../conll_dataset/aida-yago2-dataset/AIDA-YAGO2-dataset.tsv') as f:
    index = 1
    me = []
    ss = []
    first = True
    for line in f:
        if line.startswith('-DOCSTART-'):
            if first:
                first = False
                continue
            out.append([index, ' '.join(ss), list(set(me))])
            index += 1
            me = []
            ss = []
        else:
            line_spl = line.replace('\n', '').split('\t')
            ss.append(line_spl[0])
            if len(line_spl) > 4:
                if line_spl[1] == 'B':
                    me.append((line_spl[2], line_spl[4].replace('http://en.wikipedia.org/wiki/','')))
data = out

data[0] is like this:

# [doc_id, doc_text, [pairs of mention and true entity] ]
[1,
 'EU rejects German call to boycott British lamb .  Peter Blackburn  BRUSSELS 1996-08-22  The European Commission said on Thursday it disagreed with German advice to consumers to shun British lamb until scientists determine whether mad cow disease can be transmitted to ...... ',
 [('Loyola de Palacio', 'Loyola_de_Palacio'),
  ('Britain', 'United_Kingdom'),
  ('Germany', 'Germany'),
  ('European Commission', 'European_Commission'),
  ('France', 'France'),
  ('Europe', 'Europe'),
  ('BRUSSELS', 'Brussels'),
  ...
]]

and calculated accuracy:

for d in tqdm_notebook(data):
    # sentence of the target document
    sentence = d[1]

    # ts are target mentions on the document
    ts = [str(t[0]) for t in d[2]]
    true_entities = [str(t[1]).replace('_', ' ') for t in d[2]]

    # tokenize sentence by using target mentions
    # and model_probs is the output of get_probs function from the notebook you added
    tokenize = partial(en_tokenize, ts=ts)
    sent_splits, model_probs = solve_model_probs(sentence, tagger, tokenize=tokenize)

    # predicted entities that have the highest score of each mentions
    pred_entities = run(ts, sent_splits, model_probs, indices2title, type_oracle, trie, trie_index2indices_values, trie_index2indices_counts)

    # append result: true -> true entity, pred -> predicted entity
    results += [{'doc_id':d[0], 'mention':x, 'true': y, 'pred': z} for x,y,z in zip(ts, true_entities, pred_entities)]

df = pd.DataFrame(results)
matched = df['pred'] == df['true']
length = df['pred'].shape[0]
assert len(df['pred']) == len(df['true'])
accuracy = float(sum(matched))/float(length)

Is this correct way to calculate accuracy?

lbozarth commented 4 years ago

I'm stuck on this same part, the accuracy calculated this way is 0.7 though.