Open Sann5 opened 1 year ago
I completely agree that GraphGym is poorly documented and may look complicated. I am unfamiliar with and am not against Snakemake at all, but I just wanted to note that PyTorch Lightning already supports all the points mentioned above.
If this could be done completely independently of the PyG repo, I'd work on this on my own and would showcase it in the PyG discussion (e.g., #7935) if I were you for now :)
It can be done completely independently of the PyG repo, so let's do it that way. Because I'm rather busy at the moment I probably won't have anything to show until the end of the year.
From quickly skimming in the PyTorch Lightning documentation it appears that you are right; PyTorch Lightning seems to offer functionality that facilitates integration with MLFlow, YAML config files, and cluster execution. That said, and correct me if I'm wrong, it does not offer a workflow management functionality. So what we could seek to create is a workflow that leverages the functionalities offered by PyTorch Lightning but in addition to this allows users to:
I'm just not familiar with PyTorch Lightning so It would take some learning. What I have implemented already is using normal PyG, PyTorch, and Snakemake. I'm also open to using a different workflow management system (instead of Snakemake) like NextFlow if there is a reason to do so.
🛠Proposed Refactor
GraphGym's implementation is unnecessarily complicated and poorly documented. I'd like to replace it with a Snakemake workflow. This will allow users more flexibility in training and comparing different models for different datasets while increasing transparency.
Additional benefits would include:
Related issues: #5132 #6475 #6464 #6416
Suggest a potential alternative/fix
Vision
The idea is to create a workflow that can be run like any other workflow from the Snakemake workflow catalog. Therefore the usage would look something like....
1. Create env
2. Make a directory for the project
Snakedeploy will create two folders
workflow
andconfig
. The former contains the deployment of the chosen workflow as a Snakemake module, the latter contains configuration files which will be modified in the next step to configure the workflow to your needs.3. Configure workflow
This is where you modify the configuration files to suit your specific use case.
4. Run workflow
5. Visualize results
Generate a report.
Roadmap:
pyg-team/pyg_model_selection
), such that when a user is interested in using it it can use the steps described above. Alternatively the workflow and live in a folder inside thepytorhc_geometric
.pytorhc_geometric
in case we decide that it is better to have this workflowpytorhc_geometric