Diffent modalites of node information

Monketo commented 3 years ago

Hi @rusty1s ! I wonder, what would be the better approach for graph modeling using PyG when nodes have different entities with multiple modalities of information. For instance, we have two sets of node entities (let's say students and video courses) that have different sets of descriptors (some of which might also be missing).

rusty1s commented 3 years ago

Great question. I think I will write some tutorial on this topic soon :)

Here is a basic blueprint:

For operating on heterogeneous graphs, it is a good idea to store different node types and edge types separately. In your case, your data may look like this:

x_dict = {
  'user': torch.randn(8, 32),  # 8 Users with 32 features each
  'video': torch.randn(4, 64)  # 4 Videos with 64 features each
}
edge_index_dict = {
  ('user', 'watches', 'video'):  LongTensor of shape [2, xxx] mapping users to videos
  ('user', 'knows', 'user'):  LongTensor of shape [2, xxx] denoting user relationship
   ...
}

You can then proceed to construct your model using different operators for each relation type each:

conv1_user2user = SAGEConv(32, 32)
conv1_user2video = SAGEConv((32, 64), 64)

where input channels are either denoted by a scaler (in case working in an homogeneous graph) or a tuple (indicating feature size of source and target nodes).

Then, you exchange messages between different node and relation types:

x_user = x_dict['user']
x_video = x_dict['video']
x_user = conv1_user2user(x_user, edge_index_dict[('user', 'knows', 'user')])
x_video = conv1_user2video((x_user, x_video), edge_index_dict[('user', 'watches', 'video')])

Monketo commented 3 years ago

@rusty1s Amazing! Thank you so much :) Too bad that batching is not supported yet

rusty1s commented 3 years ago

Indeed, I need to work on that :)

pyg-team / pytorch_geometric

Diffent modalites of node information #2242