Reason for omitting validation inference triples from filtering when doing test evaluation in inductive lp example

pykeen / pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings

https://pykeen.readthedocs.io/en/stable/

MIT License

1.65k stars 186 forks source link

Reason for omitting validation inference triples from filtering when doing test evaluation in inductive lp example #1389

Open nomisto opened 5 months ago

nomisto commented 5 months ago

Is there a reason you omit the validation triples of the inference (test-ind) graph from filtering in your example here when doing evaluation on the test set?

I would expect that the test evaluator would also include the validation triples for filtering, like

test_evaluator = SampledRankBasedEvaluator(
    mode="testing",   # necessary to specify for the inductive mode - this will use inference nodes
    evaluation_factory=dataset.inductive_testing,  # test triples to predict
    additional_filter_triples=[dataset.inductive_inference.mapped_triples,dataset.inductive_validation.mapped_triples],   # filter out true inference triples
)

mberr commented 4 months ago

In this particular dataset, InductiveFB15k237, the validation and test graphs are guaranteed to be disjoint - so we don't need to filter for them; however, it would not hurt much and might be a good adjustment to make sure you do not forget to do this filtering in other datasets that do not have this guarantee.

nomisto commented 4 months ago

I don't think that is true. The training and inference graph are disjoint. But not the test and validation set of the inference graph. E.g. test.txt of fb237_v1_ind contains the triple /m/0p9xd /music/genre/artists /m/01304j While valid.txt of fb237_v1_ind contains the triple /m/02x8m /music/genre/artists /m/01304j Now for a query genre_artists(?, /m/01304j) the validation triple /m/02x8m would not get filtered