rampasek / GraphGPS

Recipe for a General, Powerful, Scalable Graph Transformer
MIT License
643 stars 114 forks source link

The graphgps data loader no longer overshadows all other dataloaders. #29

Open jkminder opened 1 year ago

jkminder commented 1 year ago

If users want to use graphGPS as a library in their projects the graphgps master dataloader should not overshadow all other dataloaders. Otherwise they can not create new custom graphgym datasets when building on top of the graphgps library.

The create_dataset function first checks whether any custom dataset loader function returns a dataset. A custom data loader function is ignored if it returnes None. See graphgym implementation here. Currently the custom_master_loader of graphGPS throws a ValueError if the format is not recognised, but this prohibits any user from registering their own data loader function with graphgym. Further it is not necessary to call the default pyg loaders if no customisation is applied, as the create_dataset will fallback to them if no custom routine returns a dataset.

This modification still allows the OGB dataset loader to be overwritten because any custom loader is called first (see again the official implementation in graphgym).

After the modification a user can install the graphgps library normaly and add new dataloaders functions, while any graphGPS datasets are still available (if your code has import graphgps somewhere and you have the graphgps library installed. In my opinion this is how the creators of graphgym intended this to work.

from torch_geometric.graphgym.register import register_loader

@register_loader('my_custom_dataset')
def load_dataset_example(format, name, dataset_dir):
    if format == "my_custom_datsets":
        return ds
    else:
        return None