Evaluation metrics - Githubissues

thunlp / TensorFlow-TransX

An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow

MIT License

514 stars 196 forks source link

Can you explain the evaluation metrics?

I understand that the evaluation metrics most likely correspond to the Mean Rank and Hits in Top 10(as described in the TransE paper), but am not able to fully comprehend the code here.

What is the difference between results and results(type constraints)?
What do left and right mean? I am guessing they are related to head and tail in the triples, but don't understand what they refer to in the evaluation.
Also, the results for 1-1, 1-n, n-1 do not seem to be computed separately. Why is it done only for the n-n category? Is this a bug (since they are attempted to be printed in the test method)?
I also don't seem to get results any closer to those mentioned in the TransE paper on the WN dataset. The mean rank, hits etc. are all way off. Has someone verified this?

Thanks.

**The first loop** is used to test embeddings with the default metric, and the results are like as follows,
    (?, r, t) : Mean rank, Hit 10
    (?, r, t) : Mean rank(filter), Hit 10(filter)
    (h, r, ?) : Mean rank, Hit 10
    (h, r, ?) : Mean rank(filter), Hit 10(filter)      

**The second loop** is used to test embeddings with type constraints, you can find detail information from "Type-Constrained Representation Learning in Knowledge Graphs".

**The next four loops** are used to test embeddings by mapping properties of relations. All relations will be split into four categories (1-1, 1-n, n-1, n-n) and evaluate models for each category. The results are like as follows,
1-1:
    (?, r, t) : Mean rank, Hit 10
    (?, r, t) : Mean rank(filter), Hit 10(filter)
    (h, r, ?) : Mean rank, Hit 10
    (h, r, ?) : Mean rank(filter), Hit 10(filter)      
1-n
    (?, r, t) : Mean rank, Hit 10
    (?, r, t) : Mean rank(filter), Hit 10(filter)
    (h, r, ?) : Mean rank, Hit 10
    (h, r, ?) : Mean rank(filter), Hit 10(filter)      
n-1
    (?, r, t) : Mean rank, Hit 10
    (?, r, t) : Mean rank(filter), Hit 10(filter)
    (h, r, ?) : Mean rank, Hit 10
    (h, r, ?) : Mean rank(filter), Hit 10(filter)      
n-n
    (?, r, t) : Mean rank, Hit 10
    (?, r, t) : Mean rank(filter), Hit 10(filter)
    (h, r, ?) : Mean rank, Hit 10
    (h, r, ?) : Mean rank(filter), Hit 10(filter)

thunlp / TensorFlow-TransX

Evaluation metrics #16