pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.07k stars 3.63k forks source link

[Roadmap] GraphGym via PyTorch Lightning and Hydra πŸš€ #5132

Open rusty1s opened 2 years ago

rusty1s commented 2 years ago

πŸš€ The feature, motivation and pitch

The overall goal of this roadmap is to ensure a tighter connection between PyG core and the GraphGym configuration manager. Furthermore, an additional goal is to not re-invent the wheel in GraphGym and make use of popular open-source frameworks whenever applicable, e.g., for configuration managament, training, logging, and autoML.

As such, this roadmap structures itself into different components such as general improvements (e.g., tighter connection between PyG and GraphGym), PyTorch Lightning integration, and Hydra integration as our configuration tool.

General Roadmap

PyTorch Lightning Integration

GraphGym training experience can be improved for scalability, mixed precision support, logging and checkpoints with PyTorch Lightning integration.

Hydra Integration

Users of PyG should be able to write GraphGym configurations by being able to make full use of PyG functionality. In particular, we want to allow access to any dataset, any data transformation pipeline, and any GNN layer/model. For this, we need to follow a structured/composable configuration, e.g., as introduced here

defaults:
  - dataset: KarateClub
  - transform@dataset.transform:
      - NormalizeFeatures
      - AddSelfLoops
  - model: GCN
  - optimizer: Adam
  - lr_scheduler: ReduceLROnPlateau
  - _self_

model:
  in_channels: 34
  out_channels: 4
  hidden_channels: 16
  num_layers: 2

Weights & Biases Integration (TBD)

AutoML (TBD)

cc @pyg-team/biotax-team

julian-q commented 2 years ago

Integrate LightningDataset, LightningNodeData and LightningLinkData modules

New here: what do LightningNodeData and LightningLinkData refer to?

Refactor load_ckpt and save_ckpt with PL checkpoint save and load method

Is this still needed after #4689?

rusty1s commented 2 years ago

@julian-q Welcome :) LightningNodeDataset, LightningNodeData and LightningLinkData refer to our helper data modules to connect PyG with PL, see here. Currently, they are not used within GraphGym.

Is this still needed after https://github.com/pyg-team/pytorch_geometric/pull/4689?

I assume so. load_ckpt and save_ckpt doesn't look like they currently make use of PL checkpoints.

shenoynikhil commented 1 year ago

I would like to contribute to this task. I have previously worked on using pytorch lightning and hydra together in this repo.

rusty1s commented 1 year ago

This is amazing. We should collect some information about how we want to integrate Hydra into GraphGym, as I believe we need a new config layout. I have started something a long time ago but did not finish it, see here, here and here. Would very much appreciate some advice and insights from you!

shenoynikhil commented 1 year ago

I'll spend sometime going through the links you shared and start a draft PR regarding this. Hope to get your guidance on it as well :).

rajveer43 commented 1 year ago

@rusty1s I would like to try this!

rusty1s commented 1 year ago

I would like to try this!

Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)

rajveer43 commented 1 year ago

I would like to try this!

Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)

Okay Would Work on this from Monday! I know how to code it.. would you just tell me where I can Exactly Put the code? locations of the file. which files to edit?

RagnarokAnsh commented 1 year ago

@rusty1s is it still open? can i contribute?

rusty1s commented 1 year ago

This roadmap is in a fuzzy state right now, there exists a few PRs already like https://github.com/pyg-team/pytorch_geometric/pull/5626 but I haven't really have time to merge this yet.