uma-pi1 / kge

LibKGE - A knowledge graph embedding library for reproducible research
MIT License
765 stars 124 forks source link

Embed unseen s/p/o 's #205

Closed MatthewGleeson closed 3 years ago

MatthewGleeson commented 3 years ago

Thanks for publishing this great repo!

I'm trying to use the pretrained KGE models in this repo to create embeddings for unseen objects and predicates but I'm having trouble figuring out how to do so.

The "Use a pretrained model in an application" portion of the README has been helpful, but I want to be able to pass a trio of s/p/o strings such as 'dirt' 'component of', 'clay' to a pretrained ComplEx model instead of passing an index to a value in the Wordnet database.

Is there a way to do this? Would I need to do some sort of transfer learning on the embedderr first? I've got a dataset I can use if that is the case.

rufex2001 commented 3 years ago

If I understand correctly, you need a mapping between your mentioned strings and the index used internally by the models to identify embeddings. If the triples in the dataset you are using already come with such strings, then the entity_ids.del file has the mapping you need and should be inside your dataset folder after you preprocess it. If your dataset comes with unreadable IDs like those in, for example, WN18, then you need a mapping between those IDs and their readable counterparts. This needs to come with your dataset. Once you have such mappings, you can use them to turn your strings into the indexes used by the models.

On Tue, 11 May 2021, 21:53 Matt Gleeson, @.***> wrote:

Thanks for publishing this great repo!

I'm trying to use the pretrained KGE models in this repo to create embeddings for unseen objects and predicates but I'm having trouble figuring out how to do so.

The "Use a pretrained model in an application" portion of the README has been helpful, but I want to be able to pass a trio of s/p/o strings such as 'dirt' 'component of', 'clay' to a pretrained ComplEx model instead of passing an index to a value in the Wordnet database.

Is there a way to do this? Would I need to do some sort of transfer learning on the embedderr first? I've got a dataset I can use if that is the case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/issues/205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZHMZ77O45PX4CTRZPLTNGDLZANCNFSM44WXBGCA .

MatthewGleeson commented 3 years ago

Yep I'm hoping to adapt some of the pretrained KGE models that are listed in the README for a knowledge graph-informed RL application(specifically, https://github.com/minqi/wordcraft). I've already checked and many of the objects I need embeddings for do not exist in the lookup table for some of the datasets(which I'm assuming would be required to use the entity_ids.del file you're talking about). So in that case, I'd need to either train the KGE models from scratch or make use of transfer learning to calculate the embeddings somehow, right? I can't pass these unseen strings to the entity/relationship embedders?

rufex2001 commented 3 years ago

Correct. These models don't have/learn representations of entities and relations that aren't seen during training.

On Wed, 12 May 2021, 02:05 Matt Gleeson, @.***> wrote:

Yep I'm hoping to adapt some of the pretrained KGE models that are listed in the README for a knowledge graph-informed RL application(specifically, https://github.com/minqi/wordcraft). I've already checked and many of the objects I need embeddings for do not exist in the lookup table for some of the datasets(which I'm assuming would be required to use the entity_ids.del file you're talking about). So in that case, I'd need to either re-train the KGE models from scratch or make use of transfer learning to calculate the embeddings somehow, right? I can't pass these unseen strings to the entity/relationship embedders?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/issues/205#issuecomment-839308869, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWXZE45NVXYMPL3FCNOKLTNHA27ANCNFSM44WXBGCA .

AdrianKs commented 3 years ago

you could create a new dataset extending the one the model is currently trained on. Then train on the new dataset and load the embeddings that are already trained with our load_pretrained option. Additionnally you could try and also freeze the pretrained embedding, but this option is not yet in the master branch but a in PR #136

rgemulla commented 3 years ago

You may either retrain on your dataset or use a KGE model that constructs entity/relations embeddings from textual representations. (We may add such an implementation to LibKGE soon.)

esulaiman commented 3 years ago

you could create a new dataset extending the one the model is currently trained on. Then train on the new dataset and load the embeddings that are already trained with our load_pretrained option. Additionnally you could try and also freeze the pretrained embedding, but this option is not yet in the master branch but a in PR #136

I am training my own dataset using the model provided by this library to obtain KGE. Can you kindly help with how to use load_pretrained option.

MatthewGleeson commented 3 years ago

@esulaiman it would be better to open a separate issue for this problem. But take a look at #174, and the code in the README section titled "use your own dataset"

MatthewGleeson commented 3 years ago

I'm closing this issue but I wanted to leave a note of what I did: -created a compatible dataset from the WordCraft environment matching data/toy/*.txt ( entities, relations, test, train, valid) -trained many kge models from this repo on this dataset -Used the kge repo's model.score_spo function in my WordCraft agent to inform its decisions in the mdp -This was for a group project for my NLP class, anyone interested can take a look at our demo notebook: https://colab.research.google.com/drive/1bL3U19pmd9l-1nG_lmdGx-AjVKwpmd3N?usp=sharing