thunlp / EntityDuetNeuralRanking

Entity-Duet Neural Ranking Model
MIT License
153 stars 20 forks source link

How to map entity mention to entity in CN-DBpedia? #11

Closed stellaHSR closed 4 years ago

stellaHSR commented 5 years ago

Hi, I have noticed that you did not mention about Entity Linking or any Entity Disambiguation in the paper when you talked about the entity embedding, entity description embedding as well as entity type embedding in the paper. How do you get the entity from specified KG when you have the entity mentions in the query/doc ? Any previous work you have referred to? Or what kind of model you have adopted in your paper to retrieve the correct entity from KG? As far as I know, this part maybe quite important and it may affect the final result because adding "correct" entity information into the model to help its training is essential.

EdwardZH commented 5 years ago

Thank you for your advice. There are few entity linkers for Chinese. So we simply utilize CMNS (mentioned in our paper) to link entities. CMNS matches entity names from left to right with max lenghth. I absolutly agree with you that entity linker is important for model performance. However, we have tested interaction based ranking models and found they can measure word or entity level similarity just like "attention". In this case, you can consider losts of candidate entities in these models. This paper is a study mainly focus on studying entity effectivensess, so we did not test how entity linkers effect model performance. Now, we are working with MS-MARCO dataset for further evaluate based on all kernel based ranking models and we will soon update all results base on the pubilic dataset.

EdwardZH commented 5 years ago

And for English corpus, tageme is the best choice.

stellaHSR commented 5 years ago

Great! Thank you for quickly reply. I will check the entity linker CMNS first in your related paper. Looking forward to the updated results in MS-MARCO dataset.

stellaHSR commented 5 years ago

Hi, there are some new questions about entity embedding in the paper. what is "entity type" here? Since the download CN-DBpedia data just contains data of triples, it seems hard to define "type" in triples. Why use attention mechanism here for entity type embedding? Moreover, a little confused about "lots of candidate entities in these models", does it mean there are more than one enriched entity embedding used in EDRM?

Thank you for your advice. There are few entity linkers for Chinese. So we simply utilize CMNS (mentioned in our paper) to link entities. CMNS matches entity names from left to right with max lenghth. I absolutly agree with you that entity linker is important for model performance. However, we have tested interaction based ranking models and found they can measure word or entity level similarity just like "attention". In this case, you can consider losts of candidate entities in these models. This paper is a study mainly focus on studying entity effectivensess, so we did not test how entity linkers effect model performance. Now, we are working with MS-MARCO dataset for further evaluate based on all kernel based ranking models and we will soon update all results base on the pubilic dataset.

EdwardZH commented 5 years ago

The entity types are denoted as triples which contains relation "BaiduCARD". Attention mechanism is utilized to auto select entity type because of the diversity of types for an entity. I mean interaction matrixs can learn embedding similarity, just like attention, you can see the K-NRM paper.