[Community Sprint] Documentation Tutorials 📚

rusty1s commented 1 year ago

We are kicking off another community sprint!

This community sprint resolves around improving our documentation to make PyG more easily accessible and to expose various PyG features more clearly. Each tutorial is categorized into one of three levels of expertise [EASY, MEDIUM, HARD], and should be picked depending on your expertise with PyG.

The sprint begins Thursday August 16th and will last 3 weeks. If you are interested in helping out, please also join our PyG slack channel #documentation-sprint for more information, guidance and help.

You can assign yourself to the tutorial you are planning to work on here (choose the "documentation" tab at the bottom if you get directed to a wrong tab).

Documentation Tutorials 📚

We want to improve and enhance the "Tutorials" section in our documentation. On a high-level, we plan to add various tutorials regarding GNN design, applications and use-cases, dataset handling, sampling and multi-GPU training.

GNN Design

[ ] [MEDIUM] Best-Practices on GNN Design: This tutorial should outline common building blocks in GNN modules (e.g., GNN layers, normalization layers, skip-connections (e.g., via JumpingKnowledge), and explain the various options of GNN layers we have in PyG (e.g., homogeneous GNN layers, bipartite GNN layers, GNN layers that expect edge features and edge weights, GNN layers that expect edge_type information, GNN layers designed for point clouds, etc) by cross-referencing to our GNN Cheatsheet.
[x] [EASY] #7901: Most of the tutorial can be directly copied from this blog post. It should introduce our Aggregation package and how you can leverage it to built more powerful aggregations.

Applications

[ ] [MEDIUM] Application Overview: This tutorial should introduce the various tasks you can tackle with PyG, including but not limited to node prediction, link prediction and graph classification. It should present the general idea of training pipelines and loss functions for these different tasks (e.g., global pooling in graph classification, link-level decoders in link prediction tasks), and at best should reference examples for this from our examples/ folder.
[ ] [MEDIUM] Explainability: This tutorial needs to be extended by information stemming from our blog post. In addition, it should go over benchmark datasets and explainability metrics, and reference to corresponding from our examples/explain folder.
[x] [EASY] Node2Vec/MetaPath2Vec Tutorial: This tutorial should introduce the Node2Vec and MetaPath2Vec methods and their corresponding modules in PyG. It should outline the general training flow of this modules, and how to perform down-stream tasks given the embeddings generated by these modules.
[x] [HARD] Graph Transformer Tutorial: This tutorial should cover the general idea of Graph Transformers (e.g., attention, positional encodings). It should explain the underlying framework of GPSConv module in PyG and how to use it to train Transformer modules on graph-structured data.
[x] [EASY] Point Cloud Classification/Segmentation: This tutorial should explain how we can leverage GNNs to learn on point clouds, and introduce the various layers in PyG suitable for this task. As a reference, take a look at our Google Colab Notebook. It should also explain the training pipelines of classification and segmentation tasks and reference their corresponding examples in PyG.

Datasets

[ ] [EASY] Dataset Splitting: This tutorial should cover the basics of how you can split your dataset into training, validation and test sets across the three tasks of node prediction, link prediction and graph datasets. It should introduce both RandomNodeSplit and RandomLinkSplit transformations, but also cover how you can create custom splits outside of randomly generated ones.

Sampling

[ ] [MEDIUM] Available Sampling Techniques in PyG: This tutorial should explain the basic concepts of mini-batch sampling for learning on large-scale graphs. It should cover the different options in PyG to do this, e.g., NeighborLoader, ClusterLoader, GraphSAINT, ShaDowKHop, explain their strengths and weaknesses, and which sampler/loader to pick for which task (and link to their example if available).
[x] [MEDIUM] Neighbor Sampling: This tutorial should go more in-depth into our NeighborLoader, explain its usage and reference corresponding examples. It should outline the general computation flow of GNNs with neighborhood sampling, and things to look out for (e.g., ensuring to only make use of the first batch_size many nodes for loss/metric computation). It should also cross-link to our "Hierarchical Neighborhood Sampling" tutorial as a simple extension to improve its efficiency.
[ ] [HARD] Link-level Neighbor Sampling: This tutorial should go more in-depth on how you can perform mini-batching for link prediction tasks on large-scale graphs. It should cover the basics of LinkNeighborLoader and how it works under the hood, explain the differences between edge_index and edge_label_index, and cover basic training pipelines. In addition, we can showcase how to leverage KNNIndex to perform fast-querying of nearest neighbors during inference, based on the embeddings obtained from the trained GNN.

Multi-GPU Training

[x] [EASY] #7893: This tutorial should cover the basic of how we can leverage torch.nn.DistributedDataParallel for multi-GPU training in PyG. It should briefly go over the corresponding examples in PyG for distributed batching and distributed sampling.
[ ] [MEDIUM] PyTorch Lightning: This tutorial should explain how one can leverage PyTorch Lightning within PyG for multi-GPU training. It should go over our PyTorch Lightning Wrappers in PyG to easily convert PyG datasets into a LightningDataModule instance, and go over and reference our PyTorch Lightning examples.
[ ] [MEDIUM] cugraph and cugraph-ops: (@pyg-team/nvidia-team) This tutorial should introduce and explain the usage of CuGraphConv modules in PyG. It would be great if more information can be shared on what makes these layers more efficient than their PyG counterpart. This tutorial should also capture how one can use them for multi-GPU training within cugraph.
[ ] [HARD] torch_geometric.distributed: (@pyg-team/intel-team) This tutorial should explain the usage and internals of our torch_geometric.distributed package (still WIP). More information will be added once it is ready.
[x] [HARD] GraphLearn for PyTorch (GLT): This tutorial should cover how one can leverage [GraphLearn for PyTorch]() for multi-GPU training within PyG. It should shed some lights on the internals and explain how to use it, similar to what is already present in the README.

Guide to Contributing

Ensure you have read our contributing guidelines and our tutorial for building the documentation.
Each tutorial sits inside its own *.rst file in the docs/source/tutorial/ folder. You can browse other files in this folder to get a sense for how tutorials are written and formatted.
Open a PR to the PyG repository and name it: "[Documentation] {tutorial_name}". Afterwards, create an respective entry in CHANGELOG.md to document your change/feature.
Our CI will build the documentation on every push to your PR. Once built, you can inspect it by clicking the docs/readthedocs.org:pytorch-geometric tab in the status field of your PR. As such, you do not necessarily need to build the documentation locally to see your respective changes.

MuhammadIrtiza17 commented 1 year ago

Hello Sir the link that you have added in "Guide To Contribution" of" contribution guideline " in first point is not working it is redirecting to an .md file that has not been generated so can you please resolve that issue? I was unable to view the guidelines. after going through the repository I found that someone moved it to .github/dir so kindly correct the link.

rusty1s commented 1 year ago

Hello Sir the link that you have added in "Guide To Contribution" of" contribution guideline " in first point is not working it is redirecting to an .md file that has not been generated so can you please resolve that issue?

Fixed :)

pyg-team / pytorch_geometric