Closed flpgrz closed 2 years ago
Hi @flpgrz ,
thanks for your interest in pykeen!
Since we also interested in inductive settings, we already started to properly modularize the components to easily allow this in the future. I'll go through some high-level components and try to point out where we need to make modifications.
CoreTriplesFactory: This part stores the ID-based triples. We already have an extension for additional numeric features attached to entities, cf. TriplesNumericLiteralsFactory. Maybe you can re-use this one, or use a similar implementation to store features for each entity (and relation).
RepresentationModules: they implement the mapping idx -> vector. The easiest case is Embedding, but we also have other examples in this file. This one may need the largest revision: If we are in an inductive setting, we do want the features / entity representations to be part of the input, rather than stored somewhere in the model. Thus, they should be stored either in the TriplesFactory, or the corresponding training instances, cf. pykeen.triples.instances.
Interaction: these take head/relation/tail representations and convert them to scalar plausability scores. As far as I can see, no adaptation is needed here. If you still want to take a look at their implementation, we first have the stateful modules ( i.e. subclassing torch.nn.Module
), which may contain state such as hyperparameters, e.g., the used p
-norm, or also trainable global parameters not associated with any entity or relation, e.g. TuckER's core tensor. Second, we have a (pure) functional form without any internal state at pykeen.nn.functional. Oftentimes, you can find the "real" implementation in the functional form, and the module is just a thin wrapper around it, which encapsulates some state.
P.S.: @mali-git and @migalkin have been working on inductive settings with pykeen. For this project, we did not fully integrate it into pykeen yet, but we will release code soon here. So stay tuned :wink:
Inductive NodePiece was added in https://github.com/pykeen/pykeen/pull/722. We'll make a 1.8 release soon!
Please also take a look at the main discussion on inductive link prediction at https://github.com/pykeen/pykeen/issues/720
It would be amazing to have an inductive learning pipeline.
Ideally, it would allow triples in val/test with entities which don't belong to the training set.
To make it work, one would probably need feature vectors for all train/val/test entities, instead of computing embeddings for the node IDs with e.g. torch.nn.Embedding.
I'll try to make a minimal implementation. I'm new to the library though. Do you have suggestions on what classes I should especially pay attention to, apart for adapting the models?
Thanks.