Closed smith-co closed 2 years ago
What is the input to the transformer going to be? Is it more like:
He ended his meeting on Tuesday night.
but with the graph data encoded into the embeddings somehow? Or more like:
end-01 He meet-03 data-entity Tuesday night
with the graph data iteslf as input?
The graph could be though of like the following:
________
| |
| \|/
He ended his meeting on Tuesday night.
/|\ | | /|\
| | | |
|__| |________________|
Essentially each token in the sentence is a node
and there could be edge
embedded between tokens.
In a normal transformer, the tokens are processed into token embeddings, then an encoding of each position is processed into an embedding and added to the token embeddings at the corresponding positions. The result is positional embeddings. This is how each position 'knows' where it is in the sequence.
You could do something similar with the edge information. You need some trainable network that takes the edge type and the positional encoding of the target node, combines this information, and outputs an embedding. The embeddings of all the edges can be added to the positional embeddings for the corresponding nodes.
My intuition is that the attention layers could use this encoded information to 'find' related nodes. I don't know how well it will work but that would be my approach. Good luck!
@sinking-point thanks for your response. So essentially I need to extend the positional embedding
generation considering not position in the sentence and instead based on the edge type
.
But there could be different types of edges as well. How could that be combined? I suppose there would be a need to use different weight for different types of edge?
Is there any such model implementation with hugging face? I have already have a look but can't find anything.
You could combine them like this:
Edge type as one hot vector -> nn.Embedding -> edge type embedding
Index of target node -> positional encoding -> whatever positional embedding method your chosen transformer uses -> target node embedding
Sum = edge type embedding + target node embedding
If we only have a maximum of one edge per node, we can just add this sum to the origin node embedding. However, we might have many edges and if we do this they'll interfere with eachother. We want different edge types to be able to partition themselves into different parts of the vector, so I'd try a multi layer perceptron kinda thing:
Sum (embedding width) -> nn.Linear -> hidden (bigger width) -> activation fn -> nn.Linear -> finished edge embedding
Alternatively, you could take each edge, turn it into an embedding, add embeddings for both the origin and target nodes' positional encodings. Then just append these to the transformer input. There's less complexity in that you don't need the MLP I described, but might be more expensive because attention scales quadratically with length in both time and space.
I don't know of any existing transformer that does what you want already.
@sinking-point thanks for your response. Can I apply this change in a modular fashion?
I suppose I need to augment the following snippet?
positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
Having said that how could I pass the edge information 🤔
For me the it does not need to be optimized. Have you have any code snippet demonstrating something similar 🙏 ?
What transformer do you want to use? Take Bart for example, you can pass in inputs_embeds.
I would like to use Longformer
.
I would probably go with my first suggestion then. Putting all the edges at the end might not play well with longformer's local attention.
Longformer also has inputs_embeds as an argument, so you could do something like:
class MyLongformer(nn.Module):
def __init__(...):
self.model = LongformerModel(...)
self.edge_embed = MyEdgeEmbedding(...)
def forward(...):
inputs_embeddings = self.model.get_input_embeddings()(input_ids, ...)
for batch, edge_type_id, origin_idx, target_idx in edges:
input_embeddings[batch][origin_idx] += self.edge_embed(edge_type_id, target_idx)
# might be best to normalise here
return self.model(inputs_embeds=inputs_embeds, ...)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
Embedding relational information for a transformer
Motivation
I am using Transformer model form huggingface for machine translation. However, my input data has relational information as shown below:
So I have has semantic information using Language Abstract Meaning Representation (AMR) graph in the input graph.
Is there even a way to embed relationship like the above in a transformer model? Is there any model from Huggingface that I can use in this regard?
Your contribution
If a model is developed, I could beta test the model.