snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Get stuck when iterating with NeighborLoader #353

Closed mksit closed 1 year ago

mksit commented 1 year ago

Hello, my program gets stuck at random iteration when it tries to load the datasets with the following code.

import os
from ogb.nodeproppred import Evaluator, PygNodePropPredDataset
from tqdm import tqdm
from torch_geometric.loader import NeighborLoader, NeighborSampler

#dataset_name = 'ogbn-products'
dataset_name = 'ogbn-arxiv'

print("Loading dataset...")

root = os.path.join('data', dataset_name)
dataset = PygNodePropPredDataset(dataset_name, root)
split_idx = dataset.get_idx_split()
evaluator = Evaluator(name=dataset_name)
data = dataset[0]

train_idx = split_idx['train']
train_loader = NeighborLoader(data, input_nodes=train_idx, num_neighbors=[15, 10, 5], 
                                batch_size=1024, shuffle=True, num_workers=12)
subgraph_loader = NeighborLoader(data, input_nodes=None, num_neighbors=[-1],
                                  batch_size=4096, shuffle=False, num_workers=12)

for epoch in range(1, 21):
    for batch in tqdm(train_loader, desc="Testing"):
        pass

After I have interrupted it, it shows

Loading dataset...
Loading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 89/89 [00:03<00:00, 27.66it/s]
Loading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 89/89 [00:02<00:00, 35.51it/s]
Loading:   1%|█▊                                                                                                                                                            | 1/89 [00:00<00:41,  2.11it/s]Loading:   3%|█████▎                                                                                                                                                        | 3/89 [00:16<07:54,  5.51s/it]
Traceback (most recent call last):
  File "test3.py", line 24, in <module>
    for batch in tqdm(train_loader, desc="Loading"):
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/torch_geometric/loader/base.py", line 36, in __next__
    return self.transform_fn(next(self.iterator))
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1330, in _next_data
    idx, data = self._get_data()
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1296, in _get_data
    success, data = self._try_get_data()
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1134, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/multiprocessing/queues.py", line 107, in get
    if not self._poll(timeout):
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/mankit/anaconda3/envs/gnn/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt

It happens with: pytorch: 1.12.0 py3.8_cuda11.3_cudnn8.3.2_0 torch-geometric: 2.0.5 torch-scatter: 2.0.9 torch-sparse: 0.6.14 ogb: 1.3.3