How to save StarE Encoder results

migalkin / StarE

EMNLP 2020: Message Passing for Hyper-Relational Knowledge Graphs

MIT License

85 stars 17 forks source link

How to save StarE Encoder results #13

Closed Ease112 closed 1 year ago

Ease112 commented 1 year ago

Can I save embedding vectors using only StarE Encoder without using Transformer?

geraltofrivia commented 1 year ago

The StarE encoder by itself is not an entire model, and would need some sort of a decoder to do the actual link prediction task, and thereby train the embeddings.

You can think of it (with some loss of information) in terms of the old encoder-decoder architectures wherein by itself an encoder just creates latent representations for tokens (here: nodes, and other graph artifacts); which can then be used by a task-specific decoder to perform a supervised/self-supervised task, compute the loss, and backprop.

That said, you don't necessarily have to stick to the Transformer decoders, and can swap it out for something simpler like TransE.

Ease112 commented 1 year ago

Thank you for the reply. So, since the objective of StarE is link prediction, is it impossible to generate task-independent embedded vectors, such as node2vec?

geraltofrivia commented 1 year ago

That's not necessarily true. While StarE (or other Graph Representation Learning approaches including vanilla TransE; TransH) are trained on the link prediction task, the resulting embeddings are often useful across a variety of tasks.

Just like Word2Vec is trained on the Skip-Gram objective but the trained embeddings can solve can solve analogies (man:king:: woman:queen).

In fact Node2Vec paper link, (or DeepWalk, LINE) is also trained on the Skip-Gram objective, like Word2Vec. That is, they do random walks on the graph and generate sequences like:

   n1 -(r1)->  n5 -(r2)-> n3
   n4 -(r44)-> n1 -(r1)-> n22
   ...

and then train to predict the tokens that appear in the context of a given token.

Ease112 commented 1 year ago

OK, I understand. Let me rephrase my question. Can I output text data (CSV, TSV, etc.) of vectors of entities or relations, as with TransE and node2vec? I’d like to apply StarE to other datasets to visualize and cluster embeddings.

geraltofrivia commented 1 year ago

Yes, absolutely. When you train StarE+ on your dataset, you can pass the SAVE flag which triggers this code block.

You can then find the trained model in the ./models/<datasetname>/<modelname>/ directory, and can trivially pull the embeddings for any purposes.

Ease112 commented 1 year ago

Thank you very much! I will try it.