After setting transform=RemoveIsolatedNodes() on OGB's PygGraphPropPredDataset, the number of nodes per batch mismatches the maximum edge index. (assert data.edge_index.max() < data.num_nodes raise AssertionError)
To Reproduce
Steps to reproduce the behavior:
Import libraries, load dataset, and creat data loader
from torch_geometric.data import DataLoader
from torch_geometric.transforms.remove_isolated_nodes import RemoveIsolatedNodes
from ogb.graphproppred import PygGraphPropPredDataset
2. Loop through data loader
```python
for data in train_loader:
assert data.edge_index.max() < data.num_nodes
assert data.edge_index.max() < data.x.size(0)
Stack Traces
Traceback (most recent call last):
File "demo.py", line 62, in <module>
assert data.edge_index.max() < data.num_nodes
AssertionError
Expected behavior
Should pass the assert
Environment
OS: Ubuntu 20.04.2 LTS (Focal Fossa)
Python version: Python 3.8.10
PyTorch version: 1.9.0+cu111
CUDA/cuDNN version: 11.1
GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Outputs of torch.utils.collect_env:
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Clang version: Could not collect
CMake version: version 3.21.0
Libc version: glibc-2.31
Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: NVIDIA TITAN RTX
GPU 1: Quadro RTX 8000
GPU 2: Quadro RTX 8000
Nvidia driver version: 470.42.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] torch==1.9.0+cu111
[pip3] torch-cluster==1.5.9
[pip3] torch-geometric==1.7.2
[pip3] torch-scatter==2.0.7
[pip3] torch-sparse==0.6.10
[pip3] torch-spline-conv==1.2.1
[pip3] torchvision==0.10.0+cu111
[conda] Could not collect
## Additional context
Although it's a dataset from [OGB](https://ogb.stanford.edu), it inherits from `torch_geometric.data.InMemoryDataset` and has good [implementation](https://github.com/snap-stanford/ogb/blob/master/ogb/graphproppred/dataset_pyg.py). Therefore, I think the problem isn't from OGB.
And I think it might because either [RemoveIsolatedNodes](https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/transforms/remove_isolated_nodes.py) or [remove_isolated_nodes](https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/utils/isolated.py#L24-L67) updates `num_nodes` and [Batch](https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/data/batch.py) relies on `num_nodes`, the `num_nodes` mismatches with the edge indices at the end.
I find this problem due to the similar scenario in #2083 .
Sorry if my concern is incorrect and I would love to fix it if it's a valid bug.
<!-- Add any other context about the problem here. -->
š Bug
After setting
transform=RemoveIsolatedNodes()
on OGB'sPygGraphPropPredDataset
, the number of nodes per batch mismatches the maximum edge index. (assert data.edge_index.max() < data.num_nodes raise AssertionError)To Reproduce
Steps to reproduce the behavior:
dataset_name = "ogbg-molhiv" batch_size = 32 num_workers = 1 pin_memory = False
dataset = PygGraphPropPredDataset(name=dataset_name, transform=RemoveIsolatedNodes())
split_idx = dataset.get_idx_split() train_loader = DataLoader( dataset[split_idx["train"]], batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=pin_memory )
Expected behavior
Should pass the assert
Environment
torch.utils.collect_env
:OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0 Clang version: Could not collect CMake version: version 3.21.0 Libc version: glibc-2.31
Python version: 3.8 (64-bit runtime) Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.1.105 GPU models and configuration: GPU 0: NVIDIA TITAN RTX GPU 1: Quadro RTX 8000 GPU 2: Quadro RTX 8000
Nvidia driver version: 470.42.01 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.2 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.21.0 [pip3] torch==1.9.0+cu111 [pip3] torch-cluster==1.5.9 [pip3] torch-geometric==1.7.2 [pip3] torch-scatter==2.0.7 [pip3] torch-sparse==0.6.10 [pip3] torch-spline-conv==1.2.1 [pip3] torchvision==0.10.0+cu111 [conda] Could not collect