Open john012343210 opened 3 years ago
Hi,
Sorry for my late reply.
Note that, as the multilingual versions of DBpedia are extracted from the same resource Wikipedia, most of the aligned entities have the same name. In this case, using names to align these entities may achieve high accuracy. But in the real entity alignment scenario, such as aligning English KGs to a low-resource one, or the case where entity names are not available, the methods using entity names may not work well. So, we do not recommend using entity names. More robust features and methods for entity alignment are worth exploring.
Hi,
Sorry for my late reply.
1. Entity labels mean the names of entities, not the types. 2. For example, the labels of English DBpedia can be downloaded from http://downloads.dbpedia.org/2016-10/core/labels_en.ttl.bz2.
Note that, as the multilingual versions of DBpedia are extracted from the same resource Wikipedia, most of the aligned entities have the same name. In this case, using names to align these entities may achieve high accuracy. But in the real entity alignment scenario, such as aligning English KGs to a low-resource one, or the case where entity names are not available, the methods using entity names may not work well. So, we do not recommend using entity names. More robust features and methods for entity alignment are worth exploring.
Hi, You hava done good work. It can be regarded as the foundation of entity alignment field. Everyone uses your dataset. As you said, I can think of the new version of the dataset(v2.0) as being constructed by removing information about entity names from attribute triples. So I found out in my experiment the effect of some model (such rdgcn, multike) experiments using entity name information has decreased a lot. Other models which not using name information the effect is similar to that in the paper. Excuse me ,May we think in this way?
Hi @MrYxJ ,
Apologies for the late reply again. Indeed you are right. To elaborate a bit more, we would also point out that there could be issues of fair comparison and test data leakage in cross-lingual EA in some prior studies where entity names are incorporated. This is not essentially due to embedding entity names, but due to some additional cross-lingual supervision labels/signals. E.g., in the original RDGCN and GCN-JE papers, the authors used Google Translate to translate surface forms of entities in all other languages to English, then initialize the entity embeddings in their model with pre-trained word embedding of translated entity names. This is problematic in two ways:
For the above point 2, it is unfortunate to see that a few other more recent works are (what we believe, errorneously) following such an unfair evaluation protocol, for which we definitely suggest against. In fact, a few other studies have already realized this issue and have set good examples to separate w/ and w/o MT into two evaluation settings (e.g. the HMAN and MRAEA papers). And some works have also explicitly pointed out this issue (e.g., AttrGCN, JEANS and EVA papers). We will also continue to make further clarification of this fair comparison issue in future publications and release of OpenEA versions.
Note: The above issue only applied to the cases of cross-lingual EA. For monolingual EAs where training monolingual embeddings or directly comparing entity names are without any need of cross-lingual training labels, using entity names do not violate fair comparison. Although it is definitely worthy to examine how well a system could perform without the presence of entity names and with only the structural information. since in lots of KBs (especially bio-med ones), there might not be meaningful entity names.
-Muhao
Hello author, in the paper, the following part is mentioned.
"Considering that DBpedia, Wikidata and YAGO collect data from very similar sources (mainly, Wikipedia), the aligned entities usually have identical labels. They would become “tricky” features for entity alignment and influence the evaluation of real performance. According to the suggestion in [95], we delete entity label"
May I know if this label refers to the type of that entity?( for example, the type of Michael_Jordan is Person)
Do you still have the dataset with all the labels? I would like to see whether this label could help to embed in some interesting way. If not, I might have to do some crawling to DBpedia and wikidata.
Thanks!