pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.06k stars 3.63k forks source link

About entire graph classification and adjacency matrix #1751

Open sayur1xoxo opened 3 years ago

sayur1xoxo commented 3 years ago

Hi, I'm new to pytorch-geometric. I'm sorry if you have already answered the same question. I searched for this github issues, but I was not able to find .

【Thing I want to do】 I want to classify 500,000 entire graphs (flag 0 or 1 for all graphs) using my dataset.

I'm trying to write code with reference to mutag_gin.py (https://github.com/rusty1s/pytorch_geometric/blob/master/examples/mutag_gin.py) I think mutag_gin.py is sample code that classifies the entire graph. Please point out if i am wrong

[Tried] My datasets have unique id(2million),graph_id(500,000),features(2milliion512)in pandas format. I created 2 million 2 million adjacency matrices in scipy format (because the nodes in all graphs add up to 2 million).

Created 2 million nodes features. Each node has 512 features. The graph has 1 to 10 nodes, which vary from graph to graph. I thought the number of flgs are 500,000, but I made flgs 2 million ( if graph’s flag 1 then node’s flag 1 too)

I'm having trouble understanding how to transform an adjacency matrix and input to GCN. I think I need a readout function for each graph, but I don't know how. Do I have to create a text-based dataset like MUTAG? I would appreciate any advice. If you have any good reference materials, please let me know. Thank you.

rusty1s commented 3 years ago

In general, each graph in your dataset should be separately saved into a data object. In your case, that should result in 500,000 data objects, which you can hold in a simple python list or a specific PyG dataset. Therefore, it is not necessary to create a 2 mil * 2 mil adjacency matrix. If you want to classify graphs, each graph/data object should hold exactly one label (resulting in 500,000 labels). You can convert a scipy matrix to the (edge_index, edge_weight) format via torch_geometric.utils.from_scipy_sparse_matrix.