pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.16k stars 3.64k forks source link

copy vs. deepcopy in `BaseTransform` #9505

Closed vladislach closed 1 month ago

vladislach commented 3 months ago

🐛 Describe the bug

I created a custom transform using BaseTransform and noticed that it performs in-place modification despite the use of copy.copy() in the implementation of BaseTransform to avoid such behavior. Here's the code:

from torch_geometric.transforms import BaseTransform
from torch_geometric.data import Data

class TestTransform(BaseTransform):
    def __init__(self):
        super().__init__()

    def forward(self, data):
        data.pos += 4.0
        return data

data = Data(pos=torch.tensor([0.0, 0.0, 0.0]))
print(data.pos)
tr_data = TestTransform()(data)
print(tr_data.pos)
print(data.pos)

The output is:

tensor([0., 0., 0.])
tensor([4., 4., 4.])
tensor([4., 4., 4.])

But if I explicitly make a deep copy of the input, then data.pos remains unchanged:

from torch_geometric.transforms import BaseTransform
from torch_geometric.data import Data
import copy

class TestTransform(BaseTransform):
    def __init__(self):
        super().__init__()

    def __call__(self, data):
        return self.forward(copy.deepcopy(data))

    def forward(self, data):
        data.pos += 4.0
        return data

data = Data(pos=torch.tensor([0.0, 0.0, 0.0]))
print(data.pos)
tr_data = TestTransform()(data)
print(tr_data.pos)
print(data.pos)
tensor([0., 0., 0.])
tensor([4., 4., 4.])
tensor([0., 0., 0.])

I wanted to ask if this behavior is intended or if this line should be modified to use copy.deepcopy() instead?

Versions

PyTorch version: 2.3.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.5 (arm64) GCC version: (Homebrew GCC 13.2.0) 13.2.0 Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.29.0 Libc version: N/A

Python version: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] (64-bit runtime) Python platform: macOS-14.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1

Versions of relevant libraries: [pip3] mypy==1.10.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] torch==2.3.1 [pip3] torch_cluster==1.6.3 [pip3] torch_geometric==2.5.3 [pip3] torch_scatter==2.1.2 [pip3] torch_sparse==0.6.18 [pip3] torch_spline_conv==1.2.2 [pip3] torchvision==0.18.1 [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.3.1 pypi_0 pypi [conda] torch-cluster 1.6.3 pypi_0 pypi [conda] torch-geometric 2.5.3 pypi_0 pypi [conda] torch-scatter 2.1.2 pypi_0 pypi [conda] torch-sparse 0.6.18 pypi_0 pypi [conda] torch-spline-conv 1.2.2 pypi_0 pypi [conda] torchvision 0.18.1 pypi_0 pypi

EdisonLeeeee commented 3 months ago

Yes, this is intended behavior as it is not necessary to use deep copy in all cases. Actually, Data is a dict-like object, and users themselves should carefully avoid in-place operations on it.