Open rusty1s opened 2 years ago
Integrate
LightningDataset
,LightningNodeData
andLightningLinkData
modules
New here: what do LightningNodeData
and LightningLinkData
refer to?
Refactor
load_ckpt
andsave_ckpt
with PL checkpoint save and load method
Is this still needed after #4689?
@julian-q Welcome :) LightningNodeDataset
, LightningNodeData
and LightningLinkData
refer to our helper data modules to connect PyG with PL, see here. Currently, they are not used within GraphGym.
Is this still needed after https://github.com/pyg-team/pytorch_geometric/pull/4689?
I assume so. load_ckpt
and save_ckpt
doesn't look like they currently make use of PL checkpoints.
I would like to contribute to this task. I have previously worked on using pytorch lightning and hydra together in this repo.
I'll spend sometime going through the links you shared and start a draft PR regarding this. Hope to get your guidance on it as well :).
@rusty1s I would like to try this!
I would like to try this!
Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)
I would like to try this!
Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)
Okay Would Work on this from Monday! I know how to code it.. would you just tell me where I can Exactly Put the code? locations of the file. which files to edit?
@rusty1s is it still open? can i contribute?
This roadmap is in a fuzzy state right now, there exists a few PRs already like https://github.com/pyg-team/pytorch_geometric/pull/5626 but I haven't really have time to merge this yet.
π The feature, motivation and pitch
The overall goal of this roadmap is to ensure a tighter connection between PyG core and the GraphGym configuration manager. Furthermore, an additional goal is to not re-invent the wheel in GraphGym and make use of popular open-source frameworks whenever applicable, e.g., for configuration managament, training, logging, and autoML.
As such, this roadmap structures itself into different components such as general improvements (e.g., tighter connection between PyG and GraphGym), PyTorch Lightning integration, and Hydra integration as our configuration tool.
General Roadmap
register
functionality to models in PyG coregraphgym
bash script in abin/
folder - GraphGym usage should not require manually cloning of PyGHeteroData
supportPyTorch Lightning Integration
GraphGym training experience can be improved for scalability, mixed precision support, logging and checkpoints with PyTorch Lightning integration.
LightningModule
into GraphGymTrainer
and theLightningModule
implementationsload_ckpt
andsave_ckpt
with PL checkpoint save and load methodLightningDataset
,LightningNodeData
andLightningLinkData
modulesHydra Integration
Users of PyG should be able to write GraphGym configurations by being able to make full use of PyG functionality. In particular, we want to allow access to any dataset, any data transformation pipeline, and any GNN layer/model. For this, we need to follow a structured/composable configuration, e.g., as introduced here
model.in_channels = ${dataset.num_features}
andmodel.out_channels = ${dataset.num_classes}
Weights & Biases Integration (TBD)
AutoML (TBD)
cc @pyg-team/biotax-team