Closed zhuangbility111 closed 1 year ago
Hi! I'm trying to load the ogbn-papers100M with following code:
from ogb.nodeproppred import NodePropPredDataset dataset = NodePropPredDataset(name='ogbn-papers100M')
but it output some errors like (part of it):
Traceback (most recent call last): File "node_prop_pred_data.py", line 24, in <module> dataset = NodePropPredDataset(name=dataset_name, root = dataset_save_location) File "/home/min/a/user/data/.venv/lib/python3.6/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__ self.pre_process() File "/home/min/a/user/data/.venv/lib/python3.6/site-packages/ogb/nodeproppred/dataset.py", line 139, in pre_process torch.save({'graph': self.graph, 'labels': self.labels}, pre_processed_file_path, pickle_protocol=4) File "/home/min/a/user/data/.venv/lib/python3.6/site-packages/torch/serialization.py", line 372, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) MemoryError -- std::bad_alloc
It seems like OOM... The memory size of my machine is 256GB
OOM
BTW, I also try to load the raw graph by myself. But when I load the node_label.npz :
>>> import numpy as np >>> label = np.load('node-label.npz') >>> label <numpy.lib.npyio.NpzFile object at 0x2b15ac5232b0> >>> label.__dict__ {'_files': ['node_label.npy'], 'files': ['node_label'], 'allow_pickle': False, 'pickle_kwargs': {'encoding': 'ASCII', 'fix_imports': True}, 'zip': <zipfile.ZipFile file=<_io.BufferedReader name='node-label.npz'> mode='r'>, 'f': <numpy.lib.npyio.BagObj object at 0x2b15df759640>, 'fid': <_io.BufferedReader name='node-label.npz'>} >>> node_label = label['node_label'] >>> node_label array([[ nan], [ nan], [ nan], ..., [157.], [ nan], [ nan]], dtype=float32) >>>
Why there are so many nan on the label file?
I think the error is due to the corrupted file. You may delete the old file and download it again. The nan value means unlabeled.
nan
Hi! I'm trying to load the ogbn-papers100M with following code:
but it output some errors like (part of it):
It seems like
OOM
... The memory size of my machine is 256GBBTW, I also try to load the raw graph by myself. But when I load the node_label.npz :
Why there are so many nan on the label file?