Open ghost opened 6 years ago
If I understand correctly the data you are referring to also provides a "proposal set" of entities for each mention, and marks one of the proposed entities as correct, while others are incorrect? Table 1 measured for all mentions given in the CoNLL eval set the accuracy at recovering the true entity under the proposal set of any entity in Wikipedia/Wikidata (e.g. not just those proposed by CoNLL).
Well, I just used AIDA CoNLL-YAGO dataset, and prepared it for solving accuracy like this:
out = []
with open('../conll_dataset/aida-yago2-dataset/AIDA-YAGO2-dataset.tsv') as f:
index = 1
me = []
ss = []
first = True
for line in f:
if line.startswith('-DOCSTART-'):
if first:
first = False
continue
out.append([index, ' '.join(ss), list(set(me))])
index += 1
me = []
ss = []
else:
line_spl = line.replace('\n', '').split('\t')
ss.append(line_spl[0])
if len(line_spl) > 4:
if line_spl[1] == 'B':
me.append((line_spl[2], line_spl[4].replace('http://en.wikipedia.org/wiki/','')))
data = out
data[0] is like this:
# [doc_id, doc_text, [pairs of mention and true entity] ]
[1,
'EU rejects German call to boycott British lamb . Peter Blackburn BRUSSELS 1996-08-22 The European Commission said on Thursday it disagreed with German advice to consumers to shun British lamb until scientists determine whether mad cow disease can be transmitted to ...... ',
[('Loyola de Palacio', 'Loyola_de_Palacio'),
('Britain', 'United_Kingdom'),
('Germany', 'Germany'),
('European Commission', 'European_Commission'),
('France', 'France'),
('Europe', 'Europe'),
('BRUSSELS', 'Brussels'),
...
]]
and calculated accuracy:
for d in tqdm_notebook(data):
# sentence of the target document
sentence = d[1]
# ts are target mentions on the document
ts = [str(t[0]) for t in d[2]]
true_entities = [str(t[1]).replace('_', ' ') for t in d[2]]
# tokenize sentence by using target mentions
# and model_probs is the output of get_probs function from the notebook you added
tokenize = partial(en_tokenize, ts=ts)
sent_splits, model_probs = solve_model_probs(sentence, tagger, tokenize=tokenize)
# predicted entities that have the highest score of each mentions
pred_entities = run(ts, sent_splits, model_probs, indices2title, type_oracle, trie, trie_index2indices_values, trie_index2indices_counts)
# append result: true -> true entity, pred -> predicted entity
results += [{'doc_id':d[0], 'mention':x, 'true': y, 'pred': z} for x,y,z in zip(ts, true_entities, pred_entities)]
df = pd.DataFrame(results)
matched = df['pred'] == df['true']
length = df['pred'].shape[0]
assert len(df['pred']) == len(df['true'])
accuracy = float(sum(matched))/float(length)
Is this correct way to calculate accuracy?
I'm stuck on this same part, the accuracy calculated this way is 0.7 though.
On the paper, Table 1 (c) shows the entity linking scores, but how to solve them especially CoNLL scores?
For example, some mentions and its candidate entities are there.
If it predicts one entity that has the highest score of each mentions, I don't need to use false candidates to solve accuracy, but I don't know the Table 1 used false candidates or not.
How did you solve the Table 1 (c) scores?
Paper: https://arxiv.org/pdf/1802.01021.pdf