snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Some questions about `PygNodePropPredDataset()` #455

Closed Aaricis closed 10 months ago

Aaricis commented 12 months ago

I load ogbn-arxiv in this way

dataset = PygNodePropPredDataset(name='ogbn-arxiv', root='./arxiv')

I want to use dense Tensor, so I haven't assigned the 'transform' argument. But I found that my dataset has been transformed into 'SparseTensor' automatically! But I donnot know why.

Is there any way to use 'ogbn-arxiv' dataset as dense tensor?

weihua916 commented 10 months ago

Hi! What do you mean by "dense" tensor? You will get the standard COO sparse matrix representation for dataset[0].edge_index. See below.

>>> from ogb.nodeproppred import PygNodePropPredDataset
>>> import torch_geometric
>>> torch_geometric.__version__
'2.4.0'
>>> dataset = PygNodePropPredDataset(name='ogbn-arxiv', root='./arxiv')
Downloading http://snap.stanford.edu/ogb/data/nodeproppred/arxiv.zip
Downloaded 0.08 GB: 100%|█████████████████████████████████████████████| 81/81 [00:05<00:00, 15.14it/s]
Extracting ./arxiv/arxiv.zip
Processing...
Loading necessary files...
This might take a while.
Processing graphs...
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 59918.63it/s]
Converting graphs into PyG objects...
100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1067.52it/s]
Saving...
Done!
>>> dataset[0]
Data(num_nodes=169343, edge_index=[2, 1166243], x=[169343, 128], node_year=[169343, 1], y=[169343, 1])
>>> dataset[0].edge_index
tensor([[104447,  15858, 107156,  ...,  45118,  45118,  45118],
        [ 13091,  47283,  69161,  ..., 162473, 162537,  72717]])