pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.35k stars 3.66k forks source link

Caught AttributeError in DataLoader worker process 0. #3979

Open aiyicen2 opened 2 years ago

aiyicen2 commented 2 years ago

🐛 Describe the bug

when I run the code: val_dataloader = torch_geometric.loader.DataLoader( val_dataset, batch_size=hparams.batch_size, num_workers=hparams.num_workers)

for idx,batch in enumerate(val_dataloader):
    print(idx,batch)

I got the following error, I don't know how to fix this issue: Traceback (most recent call last): File "/home/aiyicen/00_Script/ares_release/ares/train.py", line 103, in main() File "/home/aiyicen/00_Script/ares_release/ares/train.py", line 85, in main for idx,batch in enumerate(val_dataloader): File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/loader/dataloader.py", line 18, in call return Batch.from_data_list(batch, self.follow_batch, File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 68, in from_data_list batch, slice_dict, inc_dict = collate( File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/collate.py", line 32, in collate out = cls(_base_cls=data_list[0].class) # Dynamic inheritance. File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 40, in call return super(DynamicInheritance, new_cls).call(*args, **kwargs) File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/e3nn-0.1.0-py3.8-linux-x86_64.egg/e3nn/point/data_helpers.py", line 186, in init edge_index, edge_attr = neighbor_list_and_relative_vec( File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/e3nn-0.1.0-py3.8-linux-x86_64.egg/e3nn/point/data_helpers.py", line 28, in neighbor_list_and_relativevec N, = pos.shape AttributeError: 'NoneType' object has no attribute 'shape'

Environment

torch 1.5.0+cu101 torch-cluster 1.5.7 torch-geometric 2.0.4 torch-scatter 2.0.5 torch-sparse 0.6.7 torch-spline-conv 1.2.0 torchmetrics 0.7.0 torchvision 0.6.0+cu101 cudatoolkit 10.1.243

OS: Ubuntu 20.04.3 LTS GeForce RTX 3090

I installed pytorch and PYG by using the wheels and pip tool:

pip install torch-1.5.0+cu101-cp38-cp38-linux_x86_64.whl pip install torch_cluster-1.5.7-cp38-cp38-linux_x86_64.whl pip install torch_scatter-2.0.5-cp38-cp38-linux_x86_64.whl pip install torch_sparse-0.6.7-cp38-cp38-linux_x86_64.whl pip install torch_spline_conv-1.2.0-cp38-cp38-linux_x86_64.whl pip install torchvision-0.6.0+cu101-cp38-cp38-linux_x86_64.whl pip install torch_geometric

rusty1s commented 2 years ago

Do you have a minimal example to reproduce this error? Which example are you running? It looks like somewhere in the code, data.pos is referenced, but this attribute does not (yet) exist.

aiyicen2 commented 2 years ago

Do you have a minimal example to reproduce this error? Which example are you running? It looks like somewhere in the code, data.pos is referenced, but this attribute does not (yet) exist.

I loaded dataset from lmdb data by atom3d.datasets.load_dataset, then i created dataloader by torch_geometric.data.DataLoader. My code is as follow:

import atom3d.datasets as da

val_dataset = da.load_dataset(hparams.val_dataset, hparams.filetype,
                              transform=transform)

val_dataloader = torch_geometric.data.DataLoader(
    val_dataset,
    batch_size=hparams.batch_size,
    num_workers=hparams.num_workers)

When i run:

for i,data in enumerate(val_dataloader.dataset):
    print(i,data)

I got this result and the pos is not empty:

0 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_000681_minimize_008')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 1 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_001016_minimize_001')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 2 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_001067_minimize_002')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 3 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], ...... 18 DataNeighbors(x=[345, 3], edge_index=[2, 17250], edge_attr=[17250, 3], pos=[345, 3], Rs_in=[1], label=[1], id='('1kka_bps_res4_newfrags', 'S_001963_minimize_003')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 19 DataNeighbors(x=[345, 3], edge_index=[2, 17250], edge_attr=[17250, 3], pos=[345, 3], Rs_in=[1], label=[1], id='('1kka_bps_res4_newfrags', 'S_003526_minimize_003')', file_path='/scratch/users/psuriana/ares_psuriana/data/val')

But when i try to print the val_dataloader, i got the error:

for idx,batch in enumerate(val_dataloader):
print(idx,batch)
rusty1s commented 2 years ago

Sorry, I have problems reproducing this. Which dataset are you using and what are the values of each attribute in hparams?

aiyicen2 commented 2 years ago

I run the train.py and use the ares_release/data/lmdbs dataset in this source https://zenodo.org/record/5088971#.Yf3RQepByuU. I input the lmdbs/train and lmdbs/val dataset and use the default hparams. The code I've run is as follows:

import argparse as ap
import logging
import os
import pathlib
import sys

import atom3d.datasets as da
import dotenv as de
import pytorch_lightning as pl
import pytorch_lightning.loggers as log
import torch_geometric
import wandb
from torch.utils.data import DataLoader

import sys
sys.path.append(r'/home/aiyicen/00_Script/ares_release/ares/')
import data as d
import model as m

root_dir = pathlib.Path(__file__).parent.parent.absolute()
de.load_dotenv(os.path.join(root_dir, '.env'))
logger = logging.getLogger("lightning")
wandb.init(project="ares")

def main():
    parser = ap.ArgumentParser()
    # add PROGRAM level args
    parser.add_argument('train_dataset', type=str, default='/home/aiyicen/00_Script/ares_release/data/lmdbs/train')
    # parser.add_argument('--train_dataset', type=str, default='/home/aiyicen/00_Script/ares_release/data/pdbs/S_000028_476.pdb')
    parser.add_argument('--val_dataset', type=str,default='/home/aiyicen/00_Script/ares_release/data/lmdbs/val')
    # parser.add_argument('--val_dataset', type=str,default='/home/aiyicen/00_Script/ares_release/data/pdbs/S_000041_026.pdb')
    parser.add_argument('-f', '--filetype', type=str, default='lmdb',
                        choices=['lmdb', 'pdb', 'silent'])
    parser.add_argument('--batch_size', type=int, default=1)
    parser.add_argument('--label_dir', type=str, default=None)
    parser.add_argument('--num_workers', type=int, default=20)

    # add model specific args
    parser = m.ARESModel.add_model_specific_args(parser)

    # add trainer args
    parser = pl.Trainer.add_argparse_args(parser)
    hparams = parser.parse_args()
    dict_args = vars(hparams)

    transform = d.create_transform(True, hparams.label_dir, hparams.filetype)

    # DATA PREP
    logger.info(f"Dataset of type {hparams.filetype}")

    logger.info(f"Creating dataloaders...")

    train_dataset = da.load_dataset(hparams.train_dataset, hparams.filetype,
                                    transform=transform)

    train_dataloader = torch_geometric.loader.DataLoader(
    train_dataset,
    batch_size=hparams.batch_size,
    num_workers=hparams.num_workers,
    shuffle=True)

    val_dataset = da.load_dataset(hparams.val_dataset, hparams.filetype,
                                  transform=transform)

    val_dataloader = torch_geometric.data.DataLoader(
        val_dataset,
        batch_size=hparams.batch_size,
        num_workers=hparams.num_workers)

    for i,data in enumerate(val_dataloader.dataset):
        print(i,data)

if __name__ == "__main__":
    logging.basicConfig(stream=sys.stdout,
                        format='%(asctime)s %(levelname)s %(process)d: ' +
                        '%(message)s',
                        level=logging.INFO)
    main()
rusty1s commented 2 years ago

Thanks @aiyicen2. Can you say something about how your transform looks like?

snakepeterson commented 2 years ago

Had a very similar issue with ARES (ares_release) dataset, loaded an older version (torch-geometric 1.7.2) and it trained successfully.

conda environment: _libgcc_mutex 0.1 _openmp_mutex 5.1 absl-py 1.2.0 aiohttp 3.8.3 aiosignal 1.2.0 argparse 1.4.0 ase 3.22.1 async-timeout 4.0.2 atom3d 0.2.4 attrs 22.1.0 biopython 1.79 ca-certificates 2022.07.19 cachetools 5.2.0 certifi 2022.9.14 charset-normalizer 2.1.1 click 8.1.3 contourpy 1.0.5 cycler 0.11.0 dill 0.3.5.1 docker-pycreds 0.4.0 e3nn 0.1.0 # ARES specific version easy-parallel 0.1.6 emmet-core 0.36.1 fonttools 4.37.3 freesasa 2.1.0 frozenlist 1.3.1 fsspec 2022.8.2 future 0.18.2 gitdb 4.0.9 gitpython 3.1.27 google-auth 2.11.1 google-auth-oauthlib 0.4.6 googledrivedownloader 0.4 grpcio 1.49.1 h5py 3.7.0 idna 3.4 importlib-metadata 4.12.0 isodate 0.6.1 jinja2 3.1.2 joblib 1.2.0 kiwisolver 1.4.4 latexcodec 2.0.1 ld_impl_linux-64 2.38 libffi 3.3 libgcc-ng 11.2.0 libgomp 11.2.0 libstdcxx-ng 11.2.0 lie-learn 0.0.1.post1 lmdb 1.3.0 markdown 3.4.1 markupsafe 2.1.1 matplotlib 3.6.0 monty 2022.9.9 mp-api 0.27.3 mpmath 1.2.1 msgpack 1.0.4 multidict 6.0.2 multipledispatch 0.6.0 multiprocess 0.70.13 ncurses 6.3 networkx 2.8.6 numexpr 2.8.3 numpy 1.23.3 oauthlib 3.2.1 openssl 1.1.1q packaging 21.3 palettable 3.3.0 pandas 1.5.0 pathos 0.2.9 pathtools 0.1.2 pillow 9.2.0 pip 22.1.2 plotly 5.10.0 pox 0.3.1 ppft 1.7.6.5 promise 2.3 protobuf 3.19.5 psutil 5.9.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 pybtex 0.24.0 pydantic 1.10.2 pydeprecate 0.3.2 pymatgen 2022.9.21 pyparsing 3.0.9 pyrr 0.10.3 python 3.8.13 python-dateutil 2.8.2 python-dotenv 0.21.0 python-louvain 0.16 pytorch-lightning 1.7.7 pytz 2022.2.1 pyyaml 6 rdflib 6.2.0 readline 8.1.2 requests 2.28.1 requests-oauthlib 1.3.1 rsa 4.9 ruamel-yaml 0.17.21 ruamel-yaml-clib 0.2.6 scikit-learn 1.1.2 scipy 1.9.1 sentry-sdk 1.9.8 setproctitle 1.3.2 setuptools 63.4.1 shortuuid 1.0.9 six 1.16.0 smmap 5.0.0 spglib 2.0.1 sqlite 3.39.2 sympy 1.11.1 tables 3.7.0 tabulate 0.8.10 tenacity 8.1.0 tensorboard 2.10.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 threadpoolctl 3.1.0 tk 8.6.12 torch 1.12.1+cpu torch-cluster 1.6.0 torch-geometric 1.7.2 torch-scatter 2.0.9 torch-sparse 0.6.15 torch-spline-conv 1.2.1 torchmetrics 0.9.3 torchvision 0.13.1+cpu tqdm 4.64.1 typing-extensions 4.3.0 uncertainties 3.1.7 urllib3 1.26.12 wandb 0.13.3 werkzeug 2.2.2 wheel 0.37.1 xz 5.2.5 yarl 1.8.1 zipp 3.8.1 zlib 1.2.12

mjustynaPhD commented 1 year ago

Hi all, as @snakepeterson mentioned, downgrading torch-geometric to 1.7.2 should solve the problem. The default installation of torch-geometric via pip is 2.X.X, which is not compatible with ARES implementation.

jeanmonet commented 1 year ago

@rusty1s please consider the reproducible example below.

Tryin to iterate over the DataLoader object produces the same error in this train script https://github.com/shi27feng/transformers.satisfy/blob/master/src/train.py. It has also been reported here.

The DataLoader object: https://github.com/shi27feng/transformers.satisfy/blob/fc1d53bb58c7e5217f5ed1f2e502ba9c6bb7304c/src/train.py#L89

The DataLoader is being fed a subclass (SatDataset) of InMemoryDataset. The SatDataset.data is a list of BipartiteData objects (subclass of Data).

What is happening is that Dynamic Inheritance in collate function tries to instantiate a subclass of Data class, in this case BipartiteData class. It does not provide the required arguments for instantiating the class (specific to the code at hand) and produces the error.

What I do not understand is why it is trying to instantiate a new BipartiteData class, when in fact the data_list[0].__class__ is already a BipartiteData class.

I assume that something in the API of DataLoader or InMemoryDataset has changed since this code was written 2 years ago? Presumably since pytorch-geometric 1.7.2?

The error is reproduced below.

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/dataloader.py:628, in _BaseDataLoaderIter.__next__(self)
    625 if self._sampler_iter is None:
    626     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    627     self._reset()  # type: ignore[call-arg]
--> 628 data = self._next_data()
    629 self._num_yielded += 1
    630 if self._dataset_kind == _DatasetKind.Iterable and \
    631         self._IterableDataset_len_called is not None and \
    632         self._num_yielded > self._IterableDataset_len_called:

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/dataloader.py:671, in _SingleProcessDataLoaderIter._next_data(self)
    669 def _next_data(self):
    670     index = self._next_index()  # may raise StopIteration
--> 671     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    672     if self._pin_memory:
    673         data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:61, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     59 else:
     60     data = self.dataset[possibly_batched_index]
---> 61 return self.collate_fn(data)

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py:19, in Collater.__call__(self, batch)
     17 elem = batch[0]
     18 if isinstance(elem, BaseData):
---> 19     return Batch.from_data_list(batch, self.follow_batch,
     20                                 self.exclude_keys)
     21 elif isinstance(elem, torch.Tensor):
     22     return default_collate(batch)

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/batch.py:76, in Batch.from_data_list(cls, data_list, follow_batch, exclude_keys)
     64 @classmethod
     65 def from_data_list(cls, data_list: List[BaseData],
     66                    follow_batch: Optional[List[str]] = None,
     67                    exclude_keys: Optional[List[str]] = None):
     68     r"""Constructs a :class:`~torch_geometric.data.Batch` object from a
     69     Python list of :class:`~torch_geometric.data.Data` or
     70     :class:`~torch_geometric.data.HeteroData` objects.
   (...)
     73     :obj:`follow_batch`.
     74     Will exclude any keys given in :obj:`exclude_keys`."""
---> 76     batch, slice_dict, inc_dict = collate(
     77         cls,
     78         data_list=data_list,
     79         increment=True,
     80         add_batch=not isinstance(data_list[0], Batch),
     81         follow_batch=follow_batch,
     82         exclude_keys=exclude_keys,
     83     )
     85     batch._num_graphs = len(data_list)
     86     batch._slice_dict = slice_dict

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/collate.py:32, in collate(cls, data_list, increment, add_batch, follow_batch, exclude_keys)
     29     data_list = list(data_list)
     31 if cls != data_list[0].__class__:
---> 32     out = cls(_base_cls=data_list[0].__class__)  # Dynamic inheritance.
     33 else:
     34     out = cls()

File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/batch.py:48, in DynamicInheritance.__call__(cls, *args, **kwargs)
     45         continue
     46     kwargs[k] = None
---> 48 return super(DynamicInheritance, new_cls).__call__(*args, **kwargs)

File /opt/files/maio2022/SAT/transformers.satisfy/src/cnf.py:23, in BipartiteData.__init__(self, pos_adj, neg_adj, xv, xc)
     20 self.edge_index_var_pp = self.edge_index_var_pn = self.edge_index_var_np = self.edge_index_var_nn = None
     21 self.edge_index_cls_pp = self.edge_index_cls_pn = self.edge_index_cls_np = self.edge_index_cls_nn = None
---> 23 self._meta_paths_(pos_adj, neg_adj)
     24 self._put_back_cpu()

File /opt/files/maio2022/SAT/transformers.satisfy/src/cnf.py:36, in BipartiteData._meta_paths_(self, adj_pos, adj_neg)
     34 def _meta_paths_(self, adj_pos, adj_neg):
     35     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
---> 36     adj_pos = adj_pos.to(device)
     37     adj_neg = adj_neg.to(device)
     38     # print("edge pos: {}; edge neg: {}; m: {}; n: {}".format(adj_pos.size(1), adj_neg.size(1), m, n))

AttributeError: 'NoneType' object has no attribute 'to'
rusty1s commented 1 year ago

Yeah, this is correct. Currently, for batching, it is required that all arguments to __init__ are optional:

class PairData(Data):
    def __init__(self, edge_index_s=None, x_s=None, edge_index_t=None, x_t=None):
        super().__init__()
        self.edge_index_s = edge_index_s
        self.x_s = x_s
        self.edge_index_t = edge_index_t
        self.x_t = x_t