ttrouill / complex

Source code for experiments in the papers "Complex Embeddings for Simple Link Prediction" (ICML 2016) and "Knowledge Graph Completion via Complex Tensor Factorization" (JMLR 2017).
Other
319 stars 83 forks source link

Question RE: testing protocol #3

Closed samehkamaleldin closed 7 years ago

samehkamaleldin commented 7 years ago

Hello Théo,

Thanks a lot for this piece of code, I've trying to understand the testing protocol from the code but unfortunately I couldn't.

I wonder what is the positive to negative ratio used in testing, and where to find that in the code.

Thanks a lot

samehkamaleldin commented 7 years ago

I've found out that it compare to all possible entities, It's not straight forward in the code. But, from this line we can see that it use the full entity embedding in the computation of scores that the score list corresponds to all possible entities.

ttrouill commented 7 years ago

Hi Mohamed,

Sorry for the latency, I'm currently in low-internet areas :)

So indeed, when there are only positive triples (as in WN18 and FB15K), the rankings are computed among all entities (as subject and as object) for the test triples. And yes, that's the ugly part of the code I'm not proud of, though for efficency purposes, I had to rewrite the scoring functions of the models. The reason is quite simple, the 'predict' function of the model objects (as used at this line for genericity if you want to implement your own model for example) takes arrays of indexes in argument, and hence use advanced indexing that will make copies of the embeddings, even if the indexes are actually contiguous (it doesn't check). For the implemented models, that's why I reimplemented those specific scoring function that compute scores for all entities and that use the full entity embedding matrix as you pointed out, so that it is not copied at every score computation (and also doesn't compute the sigmoid as it is a strictly increasing function (order is preserved)), and allows for computing validation scores quite often (every 50 iterations by default in the code), and thus allows for early stopping. That's the whole story :) However I could have done that in Theano too, but time urged as always, so this part is in numpy.