pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.52k stars 3.57k forks source link

`InMemoryDataset`'s `load()`method is obsolete #9414

Open gbg141 opened 2 weeks ago

gbg141 commented 2 weeks ago

🛠 Proposed Refactor

Update InMemoryDataset's load()method for it to be compatible with latest datasets (e.g. TUDatasets) that store 4 elements in the processed data.pd object (data, slices, sizes, data_cls).

Right now it is only compatible with 3 elements (data, slices, data_cls) or 2 (which is called backward compatibility, with only data and slices). When implementing a custom InMemoryDataset that might be of any size 2<=x<=4, the default load()method is then obsolete with the newest options.

Suggest a potential alternative/fix

An easy fix could be to allow 4 elements in the loading method of InMemoryDataset, something like:

def load(self, path: str) -> None:
    r"""Load the dataset from the file path :obj:`path`."""
    out = fs.torch_load(path)
    assert isinstance(out, tuple)
    assert len(out) >= 2 and len(out) <= 4
    if len(out) == 2:  # Backward compatibility (1).
        data, self.slices = out
    elif len(out) == 3:  # Backward compatibility (2).
        data, self.slices, data_cls = out
    else:  # TU Datasets store additional element (__class__) in the processed file
        data, self.slices, sizes, data_cls = out

    if not isinstance(data, dict):  # Backward compatibility.
        self.data = data
    else:
        self.data = data_cls.from_dict(data)
rusty1s commented 6 days ago

You are correct that TUDataset.load is not applicable due to additonal information being stored. That's why we use custom logic in TUDataset to load the dataset. I personally think this is fine (since it is a rare case).