snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

The problem of loading OGB dataset #327

Closed CLIS-237 closed 2 years ago

CLIS-237 commented 2 years ago

When I using the following code (as shown in 4.5 Loading OGB datasets using ogb package — DGL 0.8.1 documentation) to load the OGB dataset,

import dgl
import torch
from ogb.graphproppred import DglGraphPropPredDataset
from dgl.dataloading import GraphDataLoader

def _collate_fn(batch):
    graphs = [e[0] for e in batch]
    g = dgl.batch(graphs)
    labels = [e[1] for e in batch]
    labels = torch.stack(labels, 0)
    return g, labels

# load
dataset = DglGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
# dataloader
train_loader = GraphDataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True, collate_fn=_collate_fn)
valid_loader = GraphDataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False, collate_fn=_collate_fn)
test_loader = GraphDataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False, collate_fn=_collate_fn)

I have the following problem:

python ogb_test.py 
Traceback (most recent call last):
  File "ogb_test.py", line 16, in <module>
    dataset = DglGraphPropPredDataset(name='ogbg-molhiv')
  File "/home/ssy/anaconda3/envs/ELG/lib/python3.7/site-packages/ogb/graphproppred/dataset_dgl.py", line 68, in __init__
    self.pre_process()
  File "/home/ssy/anaconda3/envs/ELG/lib/python3.7/site-packages/ogb/graphproppred/dataset_dgl.py", line 100, in pre_process
    if decide_download(url):
  File "/home/ssy/anaconda3/envs/ELG/lib/python3.7/site-packages/ogb/utils/url.py", line 13, in decide_download
    size = int(d.info()["Content-Length"])/GBFACTOR
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

I tracked the bug and found d.info()["Content-Length"]=None. Why did this happen?

weihua916 commented 2 years ago

Hi! I ran your exact code with dgl==0.8.1 without any issue. Can you delete the cached dataset folder (e.g., rm -rf dataset/ogbg_molhiv/) and run the script again?