studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 101 forks source link

Pipeline NER and ED modules #158

Closed gabrielandrade2 closed 2 years ago

gabrielandrade2 commented 2 years ago

Hello,

I was recently trying an Entity Linking pipeline for CoNLL2003 dataset using LUKE by executing the NER module to get the entity spans and then running the ED module, so I could get the reference to the Wikipedia link. Basically, I want to use the results from LUKE's NER model in the ED module somehow.

However, I cannot understand how the dataset file used in ED example page (https://github.com/studio-ousia/luke/tree/master/examples/entity_disambiguation) was created, more specifically, how the candidates were generated. I don't see any code designed for that in the repository.

Is there some step that I am missing here? Or do I need some external code to generate the candidates for ED?

ikuyamada commented 2 years ago

Hi @gabrielandrade2,

Thanks for your question! For fair comparison with past models, our ED model uses the existing dataset with predefined entity candidates proposed in a past work and is available here. Please refer to the paper for its details.