How to construct a heterograph?

thu-wangz17 commented 4 years ago

Hi,I have a question about how to construct a heterograph.For example,there are two kinds of nodes,A and B,with some features describing them.I know that to implement heterograph requires to rewrite __inc__ methods in Data class.However after that how PyG distinguishes different nodes since Data.x only takes in a tensor rather than a dict?I find the test_hete.py file in hete_conv brach.

node_types = {'A': {'x': torch.randn(3, 4)}, 'B': {'x': torch.randn(2, 4)}}
    edge_types = {
        ('A', None, 'A'): {
            'edge_index': torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]]),
        },
        ('A', None, 'B'): {
            'edge_index': torch.tensor([[0, 1, 2], [0, 1, 0]]),
            'edge_weight': torch.tensor([0.6, 0.5, 0.7]),
        },
        ('B', None, 'B'): {
            'edge_index': torch.tensor([[0, 1], [1, 0]]),
        },
    }

I think the implement is very clear and I want to batch the data in this form:

import torch
from torch_geometric.data import Data, DataLoader

class HeteroData(Data):
    def __inc__(self, key, value):
        if key == 'A->B':
            return len(self.x['A']['x'])
        elif key == 'B->B':
            return len(self.x['B']['x'])
        elif key == 'A->B':
            return torch.tensor([len(self.x['A']['x']), len(self.x['B']['x'])])
        elif key == 'B->A':
            return torch.tensor([len(self.x['B']['x']), len(self.x['A']['x'])])
        else:
            return 0

data_list = []
for i in range(5):
    data = HeteroData()
    data.x = {'A': {'x': torch.randn(3, 3)}, 'B': {'x': torch.randn(2, 3)}}
    data.edge_index = {
        ('A', 'A->A', 'A'): {
            'edge_index': torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]]), 
        },
        ('A', 'A->B', 'B'): {
            'edge_index': torch.tensor([[0, 1, 2], [0, 1, 0]]), 
        },
        ('B', 'B->A', 'A'): {
            'edge_index': torch.tensor([[0, 1, 0], [0, 1, 2]]), 
        }, 
        ('B', 'B->B', 'B'): {
            'edge_index': torch.tensor([[0, 1], [1, 0]]), 
        },
    }
    data_list.append(data)

dataloader = DataLoader(data_list, batch_size=3)

But the above code is wrong because the Data.x and Data.edge_index only take tensor.Could you give me an example about how to construct the heterograph?Thank you very much.

rusty1s commented 4 years ago

Please have a look at the AMiner dataset, which introduces the first hete-graph in PyG. In general, we save heterogeneous features and connectivity in x_dict and edge_index_dict so that it does not collide with the formulation of homogeneous graphs. Batching is currently not supported though, and I will work on it.

thu-wangz17 commented 4 years ago

Thanks.That's very helpful and I'm expecting this exciting work! I try to follow the AMiner dataset:

import torch
from torch_geometric.data import Data, DataLoader

class HeteroData(Data):
    def __inc__(self, key, value):
        if key == 'A->A':
            return len(self.x_dict['A']['x'])
        elif key == 'B->B':
            return len(self.x_dict['B']['x'])
        elif key == 'A->B':
            return torch.tensor([len(self.x_dict['A']['x']), len(self.x_dict['B']['x'])])
        elif key == 'B->A':
            return torch.tensor([len(self.x_dict['B']['x']), len(self.x_dict['A']['x'])])
        else:
            return 0

data_list = []

for i in range(5):
    node_types = {'A': {'x': torch.randn(3, 3)}, 'B': {'x': torch.randn(2, 3)}}
    edge_types = {
        ('A', 'A->A', 'A'): {
            'edge_index': torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]]), 
        },
        ('A', 'A->B', 'B'): {
            'edge_index': torch.tensor([[0, 1, 2], [0, 1, 0]]), 
        },
        ('B', 'B->A', 'A'): {
            'edge_index': torch.tensor([[0, 1, 0], [0, 1, 2]]), 
        }, 
        ('B', 'B->B', 'B'): {
            'edge_index': torch.tensor([[0, 1], [1, 0]]), 
        },
    }

    data = HeteroData()
    data.edge_index_dict = edge_types
    data.x_dict = node_types
    data_list.append(data)

dataloder = DataLoader(data_list, batch_size=2)
for i in dataloder:
    print(i.edge_index_dict)
    break

It returns

[{('A',
   'A->A',
   'A'): {'edge_index': tensor([[0, 1, 1, 2],
           [1, 0, 2, 1]])},
  ('A',
   'A->B',
   'B'): {'edge_index': tensor([[0, 1, 2],
           [0, 1, 0]])},
  ('B',
   'B->A',
   'A'): {'edge_index': tensor([[0, 1, 0],
           [0, 1, 2]])},
  ('B',
   'B->B',
   'B'): {'edge_index': tensor([[0, 1],
           [1, 0]])}},
 {('A',
   'A->A',
   'A'): {'edge_index': tensor([[0, 1, 1, 2],
           [1, 0, 2, 1]])},
  ('A',
   'A->B',
   'B'): {'edge_index': tensor([[0, 1, 2],
           [0, 1, 0]])},
  ('B',
   'B->A',
   'A'): {'edge_index': tensor([[0, 1, 0],
           [0, 1, 2]])},
  ('B',
   'B->B',
   'B'): {'edge_index': tensor([[0, 1],
           [1, 0]])}}]

If I want to batch correctly,it seems that I need to rewrite Batch class? In my opinion,each kind of meta-paths in a heterograph could be looked as a homograph.Thus I should rewrite the Batch class to make the same meta-path in different heterographs batch to form a large unlinked graph.Is that right?Or is there another approach to handle with this problem?

qshi95 commented 4 years ago

@sakuraiiiii Hi, I meet the same problem. Have you batched the data correctly?

pyg-team / pytorch_geometric

How to construct a heterograph? #1260