pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.38k stars 3.67k forks source link

DBLP_v1 dataset loading error #6955

Open cxw-droid opened 1 year ago

cxw-droid commented 1 year ago

🚀 The feature, motivation and pitch

Hello,

I cannot use TUDataset() to load DBLP_v1. It seems it runs out of memory.

Does pyg currently support DBLP_v1 loading? If not, could you please suggest some code to load DBLP_v1 without using TUDataset() directly? Thanks.

Alternatives

No response

Additional context

No response

rusty1s commented 1 year ago

Sorry for late reply. I pushed a more memory-efficient version of one_hot to PyG, see https://github.com/pyg-team/pytorch_geometric/pull/7005.

However, you might still need large amounts of RAM to process this currently, as we are creating a giant dense one-hot matrix here. The cleaner fix would be to store this in a sparse fashion, but we don't have good support for sparse input features at the moment :(

cxw-droid commented 1 year ago

Thank you very much for the update. I installed pyg using pip , but it seems you pushed the update to master branch. May I know how can test it on my machine? Thanks.

rusty1s commented 1 year ago

Uninstall PyG and run


pip install git+https://github.com/pyg-team/pytorch_geometric.git
cxw-droid commented 1 year ago

There is an error as below when I tried to load TUDataset. Any suggestions? Thanks.

Traceback (most recent call last): File "/home/abc/code/ex/_tudata.py", line 1, in from torch_geometric.datasets import TUDataset File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/init.py", line 2, in import torch_geometric.data File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/data/init.py", line 48, in from torch_geometric.loader import NeighborSampler File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/loader/init.py", line 3, in from .dataloader import DataLoader File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/loader/dataloader.py", line 9, in from torch_geometric.data.datapipes import DatasetAdapter File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/data/datapipes.py", line 36, in class SMILESParser(IterDataPipe): File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch/utils/data/_typing.py", line 273, in new return super().new(cls, name, bases, namespace, kwargs) # type: ignore[call-overload] File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/abc.py", line 106, in new cls = super().new(mcls, name, bases, namespace, kwargs) File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch/utils/data/_typing.py", line 373, in _dp_init_subclass raise TypeError("Expected 'Iterator' as the return annotation for __iter__ of {}" TypeError: Expected 'Iterator' as the return annotation for __iter__ of SMILESParser, but found typing.Any

rusty1s commented 1 year ago

Which PyTorch version are you on?

cxr0726 commented 1 year ago

There is an error as below when I tried to load TUDataset. Any suggestions? Thanks.

Traceback (most recent call last): File "/home/abc/code/ex/_tudata.py", line 1, in from torch_geometric.datasets import TUDataset File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/init.py", line 2, in import torch_geometric.data File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/data/init.py", line 48, in from torch_geometric.loader import NeighborSampler File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/loader/init.py", line 3, in from .dataloader import DataLoader File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/loader/dataloader.py", line 9, in from torch_geometric.data.datapipes import DatasetAdapter File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch_geometric/data/datapipes.py", line 36, in class SMILESParser(IterDataPipe): File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch/utils/data/_typing.py", line 273, in new return super().new(cls, name, bases, namespace, kwargs) # type: ignore[call-overload] File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/abc.py", line 106, in new cls = super().new(mcls, name, bases, namespace, kwargs) File "/home/abc/miniconda3/envs/torch1.10/lib/python3.9/site-packages/torch/utils/data/_typing.py", line 373, in _dp_init_subclass raise TypeError("Expected 'Iterator' as the return annotation for __iter__ of {}" TypeError: Expected 'Iterator' as the return annotation for __iter__ of SMILESParser, but found typing.Any

Hello, I also have this problem. My pytorch version is 1.10.0 and cuda 11.3. The torch_geometric version is 2.3.0

cxw-droid commented 1 year ago

torch 1.10.1

rusty1s commented 1 year ago

Can you patch the changes of https://github.com/pyg-team/pytorch_geometric/pull/7035 on your end and see if this fixes your issues?

cxw-droid commented 1 year ago

This time I can load dataset github_stargazers( last time I even cannot load this dataset), but I still cannot load DBLP_v1. It used up all the memory (64GB) and then terminated itself. Do you know how much memory does this dataset need?

But, when I tried to load dataset mutag, it output an error

AttributeError: module 'torch' has no attribute 'sparse_csc'

BTW, I used the above pip install git+https://github.com/pyg-team/pytorch_geometric.git command installing the newest version of pyg with #7035. Is there a more convenient way to just patch the changes of #7035? Thanks.

rusty1s commented 1 year ago

For PyG 2.3, you will need at least PyTorch 1.12. I will make this more clear in the documentation. You can also patch the fix locally by applying the changes in your local installation.