Open aiyicen2 opened 2 years ago
Do you have a minimal example to reproduce this error? Which example are you running? It looks like somewhere in the code, data.pos
is referenced, but this attribute does not (yet) exist.
Do you have a minimal example to reproduce this error? Which example are you running? It looks like somewhere in the code,
data.pos
is referenced, but this attribute does not (yet) exist.
I loaded dataset from lmdb data by atom3d.datasets.load_dataset, then i created dataloader by torch_geometric.data.DataLoader. My code is as follow:
import atom3d.datasets as da
val_dataset = da.load_dataset(hparams.val_dataset, hparams.filetype,
transform=transform)
val_dataloader = torch_geometric.data.DataLoader(
val_dataset,
batch_size=hparams.batch_size,
num_workers=hparams.num_workers)
When i run:
for i,data in enumerate(val_dataloader.dataset):
print(i,data)
I got this result and the pos is not empty:
0 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_000681_minimize_008')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 1 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_001016_minimize_001')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 2 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], id='('1q9a_bps_res4_newfrags', 'S_001067_minimize_002')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 3 DataNeighbors(x=[552, 3], edge_index=[2, 27600], edge_attr=[27600, 3], pos=[552, 3], Rs_in=[1], label=[1], ...... 18 DataNeighbors(x=[345, 3], edge_index=[2, 17250], edge_attr=[17250, 3], pos=[345, 3], Rs_in=[1], label=[1], id='('1kka_bps_res4_newfrags', 'S_001963_minimize_003')', file_path='/scratch/users/psuriana/ares_psuriana/data/val') 19 DataNeighbors(x=[345, 3], edge_index=[2, 17250], edge_attr=[17250, 3], pos=[345, 3], Rs_in=[1], label=[1], id='('1kka_bps_res4_newfrags', 'S_003526_minimize_003')', file_path='/scratch/users/psuriana/ares_psuriana/data/val')
But when i try to print the val_dataloader, i got the error:
for idx,batch in enumerate(val_dataloader):
print(idx,batch)
Sorry, I have problems reproducing this. Which dataset are you using and what are the values of each attribute in hparams
?
I run the train.py and use the ares_release/data/lmdbs dataset in this source https://zenodo.org/record/5088971#.Yf3RQepByuU. I input the lmdbs/train and lmdbs/val dataset and use the default hparams. The code I've run is as follows:
import argparse as ap
import logging
import os
import pathlib
import sys
import atom3d.datasets as da
import dotenv as de
import pytorch_lightning as pl
import pytorch_lightning.loggers as log
import torch_geometric
import wandb
from torch.utils.data import DataLoader
import sys
sys.path.append(r'/home/aiyicen/00_Script/ares_release/ares/')
import data as d
import model as m
root_dir = pathlib.Path(__file__).parent.parent.absolute()
de.load_dotenv(os.path.join(root_dir, '.env'))
logger = logging.getLogger("lightning")
wandb.init(project="ares")
def main():
parser = ap.ArgumentParser()
# add PROGRAM level args
parser.add_argument('train_dataset', type=str, default='/home/aiyicen/00_Script/ares_release/data/lmdbs/train')
# parser.add_argument('--train_dataset', type=str, default='/home/aiyicen/00_Script/ares_release/data/pdbs/S_000028_476.pdb')
parser.add_argument('--val_dataset', type=str,default='/home/aiyicen/00_Script/ares_release/data/lmdbs/val')
# parser.add_argument('--val_dataset', type=str,default='/home/aiyicen/00_Script/ares_release/data/pdbs/S_000041_026.pdb')
parser.add_argument('-f', '--filetype', type=str, default='lmdb',
choices=['lmdb', 'pdb', 'silent'])
parser.add_argument('--batch_size', type=int, default=1)
parser.add_argument('--label_dir', type=str, default=None)
parser.add_argument('--num_workers', type=int, default=20)
# add model specific args
parser = m.ARESModel.add_model_specific_args(parser)
# add trainer args
parser = pl.Trainer.add_argparse_args(parser)
hparams = parser.parse_args()
dict_args = vars(hparams)
transform = d.create_transform(True, hparams.label_dir, hparams.filetype)
# DATA PREP
logger.info(f"Dataset of type {hparams.filetype}")
logger.info(f"Creating dataloaders...")
train_dataset = da.load_dataset(hparams.train_dataset, hparams.filetype,
transform=transform)
train_dataloader = torch_geometric.loader.DataLoader(
train_dataset,
batch_size=hparams.batch_size,
num_workers=hparams.num_workers,
shuffle=True)
val_dataset = da.load_dataset(hparams.val_dataset, hparams.filetype,
transform=transform)
val_dataloader = torch_geometric.data.DataLoader(
val_dataset,
batch_size=hparams.batch_size,
num_workers=hparams.num_workers)
for i,data in enumerate(val_dataloader.dataset):
print(i,data)
if __name__ == "__main__":
logging.basicConfig(stream=sys.stdout,
format='%(asctime)s %(levelname)s %(process)d: ' +
'%(message)s',
level=logging.INFO)
main()
Thanks @aiyicen2. Can you say something about how your transform looks like?
Had a very similar issue with ARES (ares_release) dataset, loaded an older version (torch-geometric 1.7.2) and it trained successfully.
conda environment: _libgcc_mutex 0.1 _openmp_mutex 5.1 absl-py 1.2.0 aiohttp 3.8.3 aiosignal 1.2.0 argparse 1.4.0 ase 3.22.1 async-timeout 4.0.2 atom3d 0.2.4 attrs 22.1.0 biopython 1.79 ca-certificates 2022.07.19 cachetools 5.2.0 certifi 2022.9.14 charset-normalizer 2.1.1 click 8.1.3 contourpy 1.0.5 cycler 0.11.0 dill 0.3.5.1 docker-pycreds 0.4.0 e3nn 0.1.0 # ARES specific version easy-parallel 0.1.6 emmet-core 0.36.1 fonttools 4.37.3 freesasa 2.1.0 frozenlist 1.3.1 fsspec 2022.8.2 future 0.18.2 gitdb 4.0.9 gitpython 3.1.27 google-auth 2.11.1 google-auth-oauthlib 0.4.6 googledrivedownloader 0.4 grpcio 1.49.1 h5py 3.7.0 idna 3.4 importlib-metadata 4.12.0 isodate 0.6.1 jinja2 3.1.2 joblib 1.2.0 kiwisolver 1.4.4 latexcodec 2.0.1 ld_impl_linux-64 2.38 libffi 3.3 libgcc-ng 11.2.0 libgomp 11.2.0 libstdcxx-ng 11.2.0 lie-learn 0.0.1.post1 lmdb 1.3.0 markdown 3.4.1 markupsafe 2.1.1 matplotlib 3.6.0 monty 2022.9.9 mp-api 0.27.3 mpmath 1.2.1 msgpack 1.0.4 multidict 6.0.2 multipledispatch 0.6.0 multiprocess 0.70.13 ncurses 6.3 networkx 2.8.6 numexpr 2.8.3 numpy 1.23.3 oauthlib 3.2.1 openssl 1.1.1q packaging 21.3 palettable 3.3.0 pandas 1.5.0 pathos 0.2.9 pathtools 0.1.2 pillow 9.2.0 pip 22.1.2 plotly 5.10.0 pox 0.3.1 ppft 1.7.6.5 promise 2.3 protobuf 3.19.5 psutil 5.9.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 pybtex 0.24.0 pydantic 1.10.2 pydeprecate 0.3.2 pymatgen 2022.9.21 pyparsing 3.0.9 pyrr 0.10.3 python 3.8.13 python-dateutil 2.8.2 python-dotenv 0.21.0 python-louvain 0.16 pytorch-lightning 1.7.7 pytz 2022.2.1 pyyaml 6 rdflib 6.2.0 readline 8.1.2 requests 2.28.1 requests-oauthlib 1.3.1 rsa 4.9 ruamel-yaml 0.17.21 ruamel-yaml-clib 0.2.6 scikit-learn 1.1.2 scipy 1.9.1 sentry-sdk 1.9.8 setproctitle 1.3.2 setuptools 63.4.1 shortuuid 1.0.9 six 1.16.0 smmap 5.0.0 spglib 2.0.1 sqlite 3.39.2 sympy 1.11.1 tables 3.7.0 tabulate 0.8.10 tenacity 8.1.0 tensorboard 2.10.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 threadpoolctl 3.1.0 tk 8.6.12 torch 1.12.1+cpu torch-cluster 1.6.0 torch-geometric 1.7.2 torch-scatter 2.0.9 torch-sparse 0.6.15 torch-spline-conv 1.2.1 torchmetrics 0.9.3 torchvision 0.13.1+cpu tqdm 4.64.1 typing-extensions 4.3.0 uncertainties 3.1.7 urllib3 1.26.12 wandb 0.13.3 werkzeug 2.2.2 wheel 0.37.1 xz 5.2.5 yarl 1.8.1 zipp 3.8.1 zlib 1.2.12
Hi all, as @snakepeterson mentioned, downgrading torch-geometric to 1.7.2 should solve the problem. The default installation of torch-geometric via pip is 2.X.X, which is not compatible with ARES implementation.
@rusty1s please consider the reproducible example below.
Tryin to iterate over the DataLoader object produces the same error in this train script https://github.com/shi27feng/transformers.satisfy/blob/master/src/train.py. It has also been reported here.
The DataLoader object: https://github.com/shi27feng/transformers.satisfy/blob/fc1d53bb58c7e5217f5ed1f2e502ba9c6bb7304c/src/train.py#L89
The DataLoader is being fed a subclass (SatDataset
) of InMemoryDataset
. The SatDataset.data
is a list of BipartiteData
objects (subclass of Data
).
What is happening is that Dynamic Inheritance in collate
function tries to instantiate a subclass of Data
class, in this case BipartiteData
class. It does not provide the required arguments for instantiating the class (specific to the code at hand) and produces the error.
What I do not understand is why it is trying to instantiate a new BipartiteData
class, when in fact the data_list[0].__class__
is already a BipartiteData
class.
I assume that something in the API of DataLoader or InMemoryDataset has changed since this code was written 2 years ago? Presumably since pytorch-geometric 1.7.2?
The error is reproduced below.
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/dataloader.py:628, in _BaseDataLoaderIter.__next__(self)
625 if self._sampler_iter is None:
626 # TODO(https://github.com/pytorch/pytorch/issues/76750)
627 self._reset() # type: ignore[call-arg]
--> 628 data = self._next_data()
629 self._num_yielded += 1
630 if self._dataset_kind == _DatasetKind.Iterable and \
631 self._IterableDataset_len_called is not None and \
632 self._num_yielded > self._IterableDataset_len_called:
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/dataloader.py:671, in _SingleProcessDataLoaderIter._next_data(self)
669 def _next_data(self):
670 index = self._next_index() # may raise StopIteration
--> 671 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
672 if self._pin_memory:
673 data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:61, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
59 else:
60 data = self.dataset[possibly_batched_index]
---> 61 return self.collate_fn(data)
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py:19, in Collater.__call__(self, batch)
17 elem = batch[0]
18 if isinstance(elem, BaseData):
---> 19 return Batch.from_data_list(batch, self.follow_batch,
20 self.exclude_keys)
21 elif isinstance(elem, torch.Tensor):
22 return default_collate(batch)
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/batch.py:76, in Batch.from_data_list(cls, data_list, follow_batch, exclude_keys)
64 @classmethod
65 def from_data_list(cls, data_list: List[BaseData],
66 follow_batch: Optional[List[str]] = None,
67 exclude_keys: Optional[List[str]] = None):
68 r"""Constructs a :class:`~torch_geometric.data.Batch` object from a
69 Python list of :class:`~torch_geometric.data.Data` or
70 :class:`~torch_geometric.data.HeteroData` objects.
(...)
73 :obj:`follow_batch`.
74 Will exclude any keys given in :obj:`exclude_keys`."""
---> 76 batch, slice_dict, inc_dict = collate(
77 cls,
78 data_list=data_list,
79 increment=True,
80 add_batch=not isinstance(data_list[0], Batch),
81 follow_batch=follow_batch,
82 exclude_keys=exclude_keys,
83 )
85 batch._num_graphs = len(data_list)
86 batch._slice_dict = slice_dict
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/collate.py:32, in collate(cls, data_list, increment, add_batch, follow_batch, exclude_keys)
29 data_list = list(data_list)
31 if cls != data_list[0].__class__:
---> 32 out = cls(_base_cls=data_list[0].__class__) # Dynamic inheritance.
33 else:
34 out = cls()
File /opt/homebrew/Caskroom/miniconda/base/envs/spacy/lib/python3.10/site-packages/torch_geometric/data/batch.py:48, in DynamicInheritance.__call__(cls, *args, **kwargs)
45 continue
46 kwargs[k] = None
---> 48 return super(DynamicInheritance, new_cls).__call__(*args, **kwargs)
File /opt/files/maio2022/SAT/transformers.satisfy/src/cnf.py:23, in BipartiteData.__init__(self, pos_adj, neg_adj, xv, xc)
20 self.edge_index_var_pp = self.edge_index_var_pn = self.edge_index_var_np = self.edge_index_var_nn = None
21 self.edge_index_cls_pp = self.edge_index_cls_pn = self.edge_index_cls_np = self.edge_index_cls_nn = None
---> 23 self._meta_paths_(pos_adj, neg_adj)
24 self._put_back_cpu()
File /opt/files/maio2022/SAT/transformers.satisfy/src/cnf.py:36, in BipartiteData._meta_paths_(self, adj_pos, adj_neg)
34 def _meta_paths_(self, adj_pos, adj_neg):
35 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
---> 36 adj_pos = adj_pos.to(device)
37 adj_neg = adj_neg.to(device)
38 # print("edge pos: {}; edge neg: {}; m: {}; n: {}".format(adj_pos.size(1), adj_neg.size(1), m, n))
AttributeError: 'NoneType' object has no attribute 'to'
Yeah, this is correct. Currently, for batching, it is required that all arguments to __init__
are optional:
class PairData(Data):
def __init__(self, edge_index_s=None, x_s=None, edge_index_t=None, x_t=None):
super().__init__()
self.edge_index_s = edge_index_s
self.x_s = x_s
self.edge_index_t = edge_index_t
self.x_t = x_t
🐛 Describe the bug
when I run the code: val_dataloader = torch_geometric.loader.DataLoader( val_dataset, batch_size=hparams.batch_size, num_workers=hparams.num_workers)
I got the following error, I don't know how to fix this issue: Traceback (most recent call last): File "/home/aiyicen/00_Script/ares_release/ares/train.py", line 103, in
main()
File "/home/aiyicen/00_Script/ares_release/ares/train.py", line 85, in main
for idx,batch in enumerate(val_dataloader):
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/loader/dataloader.py", line 18, in call
return Batch.from_data_list(batch, self.follow_batch,
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 68, in from_data_list
batch, slice_dict, inc_dict = collate(
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/collate.py", line 32, in collate
out = cls(_base_cls=data_list[0].class) # Dynamic inheritance.
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 40, in call
return super(DynamicInheritance, new_cls).call(*args, **kwargs)
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/e3nn-0.1.0-py3.8-linux-x86_64.egg/e3nn/point/data_helpers.py", line 186, in init
edge_index, edge_attr = neighbor_list_and_relative_vec(
File "/home/aiyicen/anaconda3/envs/ares/lib/python3.8/site-packages/e3nn-0.1.0-py3.8-linux-x86_64.egg/e3nn/point/data_helpers.py", line 28, in neighbor_list_and_relativevec
N, = pos.shape
AttributeError: 'NoneType' object has no attribute 'shape'
Environment
torch 1.5.0+cu101 torch-cluster 1.5.7 torch-geometric 2.0.4 torch-scatter 2.0.5 torch-sparse 0.6.7 torch-spline-conv 1.2.0 torchmetrics 0.7.0 torchvision 0.6.0+cu101 cudatoolkit 10.1.243
OS: Ubuntu 20.04.3 LTS GeForce RTX 3090
I installed pytorch and PYG by using the wheels and pip tool:
pip install torch-1.5.0+cu101-cp38-cp38-linux_x86_64.whl pip install torch_cluster-1.5.7-cp38-cp38-linux_x86_64.whl pip install torch_scatter-2.0.5-cp38-cp38-linux_x86_64.whl pip install torch_sparse-0.6.7-cp38-cp38-linux_x86_64.whl pip install torch_spline_conv-1.2.0-cp38-cp38-linux_x86_64.whl pip install torchvision-0.6.0+cu101-cp38-cp38-linux_x86_64.whl pip install torch_geometric